Spatial Decompositions for Large Scale SVMs

Although support vector machines (SVMs) are theoretically well understood, their underlying optimization problem becomes very expensive, if, for example, hundreds of thousands of samples and a non-linear kernel are considered. Several approaches have been proposed in the past to address this serious limitation. In this work we investigate a decomposition strategy that learns on small, spatially defined data chunks. Our contributions are two fold: On the theoretical side we establish an oracle inequality for the overall learning method using the hinge loss, and show that the resulting rates match those known for SVMs solving the complete optimization problem with Gaussian kernels. On the practical side we compare our approach to learning SVMs on small, randomly chosen chunks. Here it turns out that for comparable training times our approach is significantly faster during testing and also reduces the test error in most cases significantly. Furthermore, we show that our approach easily scales up to 10 million training samples: including hyper-parameter selection using cross validation, the entire training only takes a few hours on a single machine. Finally, we report an experiment on 32 million training samples. All experiments used liquidSVM (Steinwart and Thomann, 2017).

[1]  B. Carl,et al.  Entropy, Compactness and the Approximation of Operators , 1990 .

[2]  Zoltán Szabó,et al.  Optimal Rates for Random Fourier Features , 2015, NIPS.

[3]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[4]  Ingo Steinwart,et al.  Optimal learning rates for least squares SVMs using Gaussian kernels , 2011, NIPS.

[5]  Martin J. Wainwright,et al.  Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates , 2013, J. Mach. Learn. Res..

[6]  Rong Jin,et al.  Localized Support Vector Machine and Its Efficient Algorithm , 2007, SDM.

[7]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[8]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[9]  Rong Jin,et al.  Efficient Algorithm for Localized Support Vector Machine , 2010, IEEE Transactions on Knowledge and Data Engineering.

[10]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[11]  Ingo Steinwart,et al.  Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[12]  Don R. Hush,et al.  Training SVMs Without Offset , 2011, J. Mach. Learn. Res..

[13]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[14]  Chi-Jen Lu,et al.  Tree Decomposition for Large-Scale SVM Problems , 2010, 2010 International Conference on Technologies and Applications of Artificial Intelligence.

[15]  Ingo Steinwart,et al.  Optimal Learning Rates for Localized SVMs , 2015, J. Mach. Learn. Res..

[16]  Nello Cristianini,et al.  Large Margin Trees for Induction and Transduction , 1999, ICML.

[17]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[18]  Lorenzo Rosasco,et al.  Iterative Regularization for Learning with Convex Loss Functions , 2015, J. Mach. Learn. Res..

[19]  Léon Bottou,et al.  Local Algorithms for Pattern Recognition and Dependencies Estimation , 1993, Neural Computation.

[20]  Robert Hable Universal consistency of localized versions of regularized kernel methods , 2013, J. Mach. Learn. Res..

[21]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[22]  Lorenzo Rosasco,et al.  Learning with Incremental Iterative Regularization , 2014, NIPS.

[23]  Ingo Steinwart,et al.  liquidSVM: A Fast and Versatile SVM package , 2017, ArXiv.

[24]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  K. Bennett,et al.  A support vector machine approach to decision trees , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).