Fast kernel SVM training via support vector identification

Training kernel SVM on large datasets suffers from high computational complexity and requires a large amount of memory. However, a desirable property of SVM is that its decision function is solely determined by the support vectors, a subset of training examples with non-vanishing weights. This motivates a novel efficient algorithm for training kernel SVM via support vector identification. The efficient training algorithm involves two steps. In the first step, we randomly sample the training data without replacement several times, each time a small subset of training data is sampled. Then a kernel SVM is trained on each subset, and the resulting kernel SVM models are used to identify the support vectors on the margin. In the second step, an optimization problem is solved to estimate the Lagrange multipliers corresponding to these support vectors. After obtaining the support vectors and Lagrange multipliers, we can approximate the decision function of kernel SVM. Due to the cubic complexity of standard kernel SVM training algorithm, training many kernel SVMs on small subsets of training data is much more efficient than training a single kernel SVM on the whole training data especially for large datasets. Therefore, our algorithm has better scalability than kernel SVM. Besides, training SVMs on each subset can be done independently, and hence our algorithm can be easily parallelized for further speedup. Since our algorithm only identifies the support vectors on the margin, it produces less number of support vectors as compared to that produced by standard kernel SVM. This makes our algorithm more efficient in prediction too. Experimental results show that our method outperforms state-of-the-art methods and achieves performance on par with the kernel SVM albeit with much improved efficiency.

[1]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[3]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[4]  Ning Chen,et al.  Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines , 2011, ICML.

[5]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[6]  Prasoon Goyal,et al.  Local Deep Kernel Learning for Efficient Non-linear SVM Prediction , 2013, ICML.

[7]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[8]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[9]  Jiawei Han,et al.  Clustered Support Vector Machines , 2013, AISTATS.

[10]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[11]  Nathan Srebro,et al.  Beating SGD: Learning SVMs in Sublinear Time , 2011, NIPS.

[12]  James T. Kwok,et al.  Making Large-Scale Nyström Approximation Possible , 2010, ICML.

[13]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[14]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[15]  Rong Yan,et al.  Adaptive Kernel Approximation for Large-Scale Non-Linear SVM Prediction , 2011, ICML.

[16]  S. Sathiya Keerthi,et al.  Building Support Vector Machines with Reduced Classifier Complexity , 2006, J. Mach. Learn. Res..

[17]  Inderjit S. Dhillon,et al.  A Divide-and-Conquer Solver for Kernel Support Vector Machines , 2013, ICML.

[18]  Andrew Zisserman,et al.  Sparse kernel approximations for efficient classification and detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21]  F. Melgani,et al.  An Adaptive SVM Nearest Neighbor Classifier for Remotely Sensed Imagery , 2006, 2006 IEEE International Symposium on Geoscience and Remote Sensing.

[22]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[23]  Thorsten Joachims,et al.  Sparse kernel SVMs via cutting-plane training , 2009, Machine Learning.

[24]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[25]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[26]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[27]  Philip H. S. Torr,et al.  Locally Linear Support Vector Machines , 2011, ICML.

[28]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.