Clustered Support Vector Machines

In many problems of machine learning, the data are distributed nonlinearly. One way to address this kind of data is training a nonlinear classifier such as kernel support vector machine (kernel SVM). However, the computational burden of kernel SVM limits its application to large scale datasets. In this paper, we propose a Clustered Support Vector Machine (CSVM), which tackles the data in a divide and conquer manner. More specifically, CSVM groups the data into several clusters, followed which it trains a linear support vector machine in each cluster to separate the data locally. Meanwhile, CSVM has an additional global regularization, which requires the weight vector of each local linear SVM aligning with a global weight vector. The global regularization leverages the information from one cluster to another, and avoids over-fitting in each cluster. We derive a data-dependent generalization error bound for CSVM, which explains the advantage of CSVM over linear SVM. Experiments on several benchmark datasets show that the proposed method outperforms linear SVM and some other related locally linear classifiers. It is also comparable to a fine-tuned kernel SVM in terms of prediction performance, while it is more efficient than kernel SVM.

[1]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[2]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[3]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[4]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[5]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[6]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[7]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[8]  Ning Chen,et al.  Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines , 2011, ICML.

[9]  J. Suykens,et al.  Ensemble Learning of Coupled Parmeterised Kernel Models , 2003 .

[10]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Samy Bengio,et al.  A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[15]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[18]  Enrico Blanzieri,et al.  Fast and Scalable Local Kernel Machines , 2010, J. Mach. Learn. Res..

[19]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[20]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[21]  Qi Tian,et al.  Locality-sensitive support vector machine by exploring local correlation and global regularization , 2011, CVPR 2011.

[22]  Ronald A. Cole,et al.  Spoken Letter Recognition , 1990, HLT.

[23]  Shai Ben-David,et al.  A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering , 2007, Machine Learning.

[24]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[25]  Nathan Srebro,et al.  Beating SGD: Learning SVMs in Sublinear Time , 2011, NIPS.

[26]  Jun Zhou,et al.  Mixing Linear SVMs for Nonlinear Classification , 2010, IEEE Transactions on Neural Networks.

[27]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Philip H. S. Torr,et al.  Locally Linear Support Vector Machines , 2011, ICML.

[29]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[30]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[32]  Ohad Shamir,et al.  There's a Hole in My Data Space: Piecewise Predictors for Heterogeneous Learning Problems , 2012, AISTATS.