A Distributed Ensemble Scheme for nonlinear Support Vector Machine

We propose an ensemble scheme with a parallel computational structure which we call Distributed Ensemble Support Vector Machine (DESVM) to overcome the difficulties of large scale nonlinear Support Vector Machines (SVMs) in practice. The dataset is split into many stratified partitions. Each partition might be still too large to be solved by using conventional SVM solvers. We apply the reduced kernel trick to generate a nonlinear SVM classifier for each partition that can be treated as an approximation model based on the partial dataset. Then, we use a linear SVM classifier to fuse the nonlinear SVM classifiers that are generated from all data partitions. In this linear SVM training model, we treat each nonlinear SVM classifier as an “attribute” or an “expert”. In the ensemble phase, DESVM generates a fusion model which is a weighted combination of the nonlinear SVM classifiers. It can be explained as a weighted voting decision made by a group of experts. We test our proposed method on five benchmark datasets. The numerical results show that DESVM is competitive in accuracy and has a high speed-up. Thus, DESVM can be a powerful tool for binary classification problems with large scale not linearly separable datasets.

[1]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[2]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[3]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[4]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[5]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[6]  Su-Yun Huang,et al.  Reduced Support Vector Machines: A Statistical Theory , 2007, IEEE Transactions on Neural Networks.

[7]  Inderjit S. Dhillon,et al.  A Divide-and-Conquer Solver for Kernel Support Vector Machines , 2013, ICML.

[8]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[9]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[10]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[11]  Alexander G. Gray,et al.  Fast Stochastic Frank-Wolfe Algorithms for Nonlinear SVMs , 2010, SDM.

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[14]  R. Courant,et al.  Methods of Mathematical Physics , 1962 .

[15]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[16]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .

[17]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.