Condensed Vector Machines: Learning Fast Machine for Large Data

Scalability is one of the main challenges for kernel-based methods and support vector machines (SVMs). The quadratic demand in memory for storing kernel matrices makes it impossible for training on million-size data. Sophisticated decomposition algorithms have been proposed to efficiently train SVMs using only important examples, which ideally are the final support vectors (SVs). However, the ability of the decomposition method is limited to large-scale applications where the number of SVs is still too large for a computer's capacity. From another perspective, the large number of SVs slows down SVMs in the testing phase, making it impractical for many applications. In this paper, we introduce the integration of a vector combination scheme to simplify the SVM solution into an incremental working set selection for SVM training. The main objective of the integration is to maintain a minimal number of final SVs, bringing a minimum resource demand and faster training time. Consequently, the learning machines are more compact and run faster thanks to the small number of vectors included in their solution. Experimental results on large benchmark datasets shows that the proposed condensed SVMs achieve both training and testing efficiency while maintaining a generalization ability equivalent to that of normal SVMs.

[1]  Peter Tino,et al.  IEEE Transactions on Neural Networks , 2009 .

[2]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[3]  Dominic Mazzoni,et al.  Multiclass reduced-set support vector machines , 2006, ICML.

[4]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[5]  Yasuhiro Takishima,et al.  Two-stage incremental working set selection for fast support vector training on large datasets , 2008, 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies.

[6]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[7]  S. Sathiya Keerthi,et al.  Building Support Vector Machines with Reduced Classifier Complexity , 2006, J. Mach. Learn. Res..

[8]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[9]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[10]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[11]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[12]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[13]  Klaus-Robert Müller,et al.  Incremental Support Vector Learning: Analysis, Implementation and Applications , 2006, J. Mach. Learn. Res..

[14]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[15]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[16]  Tu Bao Ho,et al.  A bottom-up method for simplifying support vector solutions , 2006, IEEE Transactions on Neural Networks.

[17]  Jacek M. Zurada,et al.  Generalized Core Vector Machines , 2006, IEEE Transactions on Neural Networks.

[18]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[19]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[20]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[21]  Samy Bengio,et al.  A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[24]  Stéphane Canu,et al.  Comments on the "Core Vector Machines: Fast SVM Training on Very Large Data Sets" , 2007, J. Mach. Learn. Res..

[26]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[27]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[28]  Xin Yao,et al.  Sparse Approximation Through Boosting for Learning Large Scale Kernel Machines , 2010, IEEE Transactions on Neural Networks.

[29]  Chih-Jen Lin,et al.  IJCNN 2001 challenge: generalization ability and text decoding , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[30]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[31]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[32]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[33]  Ichiro Takeuchi,et al.  Multiple Incremental Decremental Learning of Support Vector Machines , 2009, IEEE Transactions on Neural Networks.

[34]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[35]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .