A Parallel Mixture of SVMs for Very Large Scale Problems

Support vector machines (SVMs) are the state-of-the-art models for many classification problems, but they suffer from the complexity of their training algorithm, which is at least quadratic with respect to the number of examples. Hence, it is hopeless to try to solve real-life problems having more than a few hundred thousand examples with SVMs. This article proposes a new mixture of SVMs that can be easily implemented in parallel and where each SVM is trained on a small subset of the whole data set. Experiments on a large benchmark data set (Forest) yielded significant time improvement (time complexity appears empirically to locally grow linearly with the number of examples). In addition, and surprisingly, a significant improvement in generalization was observed.

[1]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[2]  Ronald A. Cole,et al.  New telephone speech corpora at CSLU , 1995, EUROSPEECH.

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[6]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[7]  J. Tin-Yau Kwok Support vector mixture for classification and regression problems , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[8]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[9]  Christian Pellegrini,et al.  Local experts combination through density decomposition , 1999, AISTATS.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory, Second Edition , 2000, Statistics for Engineering and Information Science.

[11]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[14]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.