Ensembles of budgeted kernel support vector machines for parallel large scale learning

In this work, we propose to combine multiple budgeted kernel support vector machines (SVMs) trained with stochastic gradient descent (SGD) in order to exploit large databases and parallel computing resources. The variance induced by budget restrictions of the kernel SVMs is reduced through the averaging of predictions, resulting in greater generalization performance. The variance of the trainings results in a diversity of predictions, which can help explain the better performance. Finally, the proposed method is intrinsically parallel, which means that parallel computing resources can be exploited in a straightforward manner.

[1]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[2]  Barbara Caputo,et al.  Bounded Kernel-Based Online Learning , 2009, J. Mach. Learn. Res..

[3]  Rocco A. Servedio,et al.  Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms , 2001, NIPS.

[4]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[5]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[8]  Nathan Srebro,et al.  Learning Optimally Sparse Support Vector Machines , 2013, ICML.

[9]  Patrick Gallinari,et al.  SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[10]  Giorgio Valentini,et al.  Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods , 2004, J. Mach. Learn. Res..

[11]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[12]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Budget , 2008, SIAM J. Comput..

[13]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[14]  John Langford,et al.  Slow Learners are Fast , 2009, NIPS.

[15]  Zheng Chen,et al.  P-packSVM: Parallel Primal grAdient desCent Kernel SVM , 2009, 2009 Ninth IEEE International Conference on Data Mining.