Parallel Support Vector Machines on Multi-Core and Multiprocessor Systems

This paper proposes a new and efficient parallel implementation of support vector machines based on decomposition method for handling large scale datasets. The parallelizing is performed on the most time-and-memory consuming work of training, i.e., to update the vector f . The inner problems are dealt by sequential minimal optimization solver. Since the underlying parallelism is realized by the shared memory version of Map-Reduce paradigm, our system is easy to build and particularly suitable to apply to multi-core and multiprocessor systems. Experimental results show that on most of the tested datasets, our system offers higher than fou rfold increase in speed compared to Libsvm, and it is also far more efficient than the MPI implementation Pisvm. Keywords-support vector machine; parallel; multi-core; Map-Reduce I. I NTRODUCTION Support vector machine (SVM) is a popular supervised learning method of solving classification and regression problems [1]. It shows robust generalization ability in numerous applications such as image processing, text mining, neural analysis and energy efficiency modeling [2], [3]. The training of SVM is essentially a quadratic optimization pro blem which is both time and memory costly while running on computers, making it a challenge to apply SVM on large scale problems. Several optimizing or heuristic methods have been proposed to accelerate the training and reduce the memory occupation, such as shrinking, chunking [4], kernel caching [5], approximation of kernel matrix [6]. In additio n, certain scalable solvers can be used such as Sequential Minimal Optimization (SMO) [7], mixture SVMs [8], primal estimated sub-gradient solver [9]. Despite these efforts, however, a more sophisticated and satisfactory solution is still expected for this challenging research problem. Thanks to the modern chip manufacturing, we are entering the multi-core era. Computers with multi-cores or multipro cessors are becoming more available and affordable. This paper aims to investigate and demonstrate how SVM — a popular machine learning algorithm can benefit from this modern platform. A new parallel SVM that is particularly suitable to shared memory system is proposed. Decomposition method, caching and SMO inner quadratic problem solver are composed in the implementation as the key techniques. For the purpose of achieving easy implementation without sacrificing performance, the state-of-the-art par allel programming framework Map-Reduce is chosen to perform the underlying parallelism. The system is therefore called MRPsvm, which stands for ”Map-Reduce parallel SVM”. Comparative system analysis and experimental results on benchmark datasets show significant memory saving and overwhelming speed increase in our system. The following sections are organized as follows. Section II introduces the basic problem of SVM training and how the decomposition method can solve the problem. Section III explains the some key points of system implementation. Section IV describes the related work. Section V presents the numerical experiments on benchmark datasets and corresponding results. Conclusions are drawn in section VI. II. SUPPORT VECTOR MACHINE SVM for classification purpose aims at finding a hyperplane to separate two classes with maximum margin. Let xi denotes theith training sample,yi denotes the corresponding label with the value either -1 or 1, i = 1, 2, ..., l. l is the total number of training samples. The dual form of the SVM can be written as the following convex quadratic function.

[1]  S. Sathiya Keerthi,et al.  Parallel sequential minimal optimization for the training of support vector machines , 2006, IEEE Trans. Neural Networks.

[2]  Dominik Brugger,et al.  Parallel Support Vector Machines , 2006 .

[3]  Edward Y. Chang,et al.  Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[4]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[6]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[7]  Samy Bengio,et al.  A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[8]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[9]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[10]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[11]  Frédéric Magoulès,et al.  Parallel Support Vector Machines Applied to the Prediction of Multiple Buildings Energy Consumption , 2010 .

[12]  Kurt Keutzer,et al.  Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[13]  Zheng Chen,et al.  P-packSVM: Parallel Primal grAdient desCent Kernel SVM , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Hao Wang,et al.  PSVM : Parallelizing Support Vector Machines on Distributed Computers , 2007 .

[18]  Tamir Hazan,et al.  A Parallel Decomposition Solver for SVM: Distributed dual ascend using Fenchel Duality , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[20]  Danny Dolev,et al.  A Gaussian Belief Propagation Solver for Large Scale Support Vector Machines , 2008, ArXiv.

[21]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[22]  Jie Pan,et al.  Parallelizing multiple group-by query in share-nothing environment: a MapReduce study case , 2010, HPDC '10.

[23]  Jian-xiong Dong,et al.  Fast SVM training algorithm with decomposition on very large data sets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Frédéric Magoulès,et al.  Vapnik's learning theory applied to energy consumption forecasts in residential buildings , 2008, Int. J. Comput. Math..

[25]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[26]  Luca Zanni,et al.  Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems , 2006, J. Mach. Learn. Res..

[27]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[28]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.