Accelerating the Training Process of Support Vector Machines by Random Partition

—In this paper we present a novel method, Random Partition based SVM (RPSVM), for speeding up SVM training. Instead of clustering the training data prior to training, RPSVM randomly partitions the training data into several clusters and then uses the centers of the clusters to train an initial SVM. This trained SVM is used to find critical clusters which are located on the decision boundary. The same procedure is applied repeatedly to each of the critical clusters, resulting in a refined SVM which consists of the supporting vectors in the initial round of training and those in the repeated round. This procedure is repeated recursively until no critical cluster exists, resulting in the final SVM. Our experiments on synthetic and real data sets have shown that RPSVM is indeed scalable to large data sets while the high performance is retained.

[1]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[2]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[3]  Daniel Boley,et al.  Training Support Vector Machines Using Adaptive Clustering , 2004, SDM.

[4]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[5]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[6]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[7]  Osamu Watanabe,et al.  A Random Sampling Technique for Training Support Vector Machines , 2001, ALT.

[8]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[9]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Jiawei Han,et al.  Classifying large data sets using SVMs with hierarchical clusters , 2003, KDD '03.

[12]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[15]  Antônio de Pádua Braga,et al.  SVM-KM: speeding SVMs learning with a priori cluster selection and k-means , 2000, Proceedings. Vol.1. Sixth Brazilian Symposium on Neural Networks.

[16]  S. Halgamuge,et al.  Reducing the Number of Training Samples for Fast Support Vector Machine Classification , 2004 .

[17]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[18]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[19]  Leon N. Cooper,et al.  Training Data Selection for Support Vector Machines , 2005, ICNC.