Incremental approximate matrix factorization for speeding up support vector machines

Traditional decomposition-based solutions to Support Vector Machines (SVMs) suffer from the widely-known scalability problem. For example, given a one-million training set, it takes about six days for SVMLight to run on a Pentium-4 sever with 8G-byte memory. In this paper, we propose an incremental algorithm, which performs approximate matrix-factorization operations, to speed up SVMs. Two approximate factorization schemes, Kronecker and incomplete Cholesky, are utilized in the primal-dual interior-point method (IPM) to directly solve the quadratic optimization problem in SVMs. We found out that a coarse approximate algorithm enjoys good speedup performance but may suffer from poor training accuracy. Conversely, a fine-grained approximate algorithm enjoys good training quality but may suffer from long training time. We subsequently propose an incremental training algorithm, which uses the approximate IPM solution of a coarse factorization to initialize the IPM of a fine-grained factorization. Extensive empirical studies show that our proposed incremental algorithm with approximate factorizations substantially speeds up SVM training while maintaining high training accuracy. In addition, we show that our proposed algorithm is highly parallelizable on an Intel dual-coreprocessor.

[1]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[2]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[3]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[4]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[5]  Jiawei Han,et al.  Classifying large data sets using SVMs with hierarchical clusters , 2003, KDD '03.

[6]  Sanjay Mehrotra,et al.  On the Implementation of a Primal-Dual Interior Point Method , 1992, SIAM J. Optim..

[7]  Luca Zanni,et al.  A parallel solver for large quadratic programs in training support vector machines , 2003, Parallel Comput..

[8]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[9]  Michael C. Ferris,et al.  Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[10]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[11]  Edward Y. Chang,et al.  Kronecker Factorization for Speeding up Kernel Machines , 2005, SDM.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  L. Kantorovich,et al.  Functional analysis and applied mathematics , 1963 .

[14]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[15]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[16]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[19]  Clóvis C. Gonzaga,et al.  Path-Following Methods for Linear Programming , 1992, SIAM Rev..

[20]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[22]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[23]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[24]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[25]  C. Loan,et al.  Approximation with Kronecker Products , 1992 .