Core Vector Machines: Fast SVM Training on Very Large Data Sets

Standard SVM training has O(m3) time and O(m2) space complexities, where m is the training set size. It is thus computationally infeasible on very large data sets. By observing that practical SVM implementations only approximate the optimal solution by an iterative strategy, we scale up kernel methods by exploiting such "approximateness" in this paper. We first show that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry. Then, by adopting an efficient approximate MEB algorithm, we obtain provably approximately optimal solutions with the idea of core sets. Our proposed Core Vector Machine (CVM) algorithm can be used with nonlinear kernels and has a time complexity that is linear in m and a space complexity that is independent of m. Experiments on large toy and real-world data sets demonstrate that the CVM is as accurate as existing SVM implementations, but is much faster and can handle much larger data sets than existing scale-up methods. For example, CVM with the Gaussian kernel produces superior results on the KDDCUP-99 intrusion detection data, which has about five million training patterns, in only 1.4 seconds on a 3.2GHz Pentium--4 PC.

[1]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[2]  M. Garey Johnson: computers and intractability: a guide to the theory of np- completeness (freeman , 1979 .

[3]  Nimrod Megiddo,et al.  Linear-time algorithms for linear programming in R3 and related problems , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[4]  Nimrod Megiddo,et al.  Linear-Time Algorithms for Linear Programming in R^3 and Related Problems , 1982, FOCS.

[5]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[6]  Emo Welzl,et al.  Smallest enclosing disks (balls and ellipsoids) , 1991, New Results and New Trends in Computer Science.

[7]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[8]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[9]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[10]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  Nello Cristianini,et al.  The Kernel-Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines , 1998, ICML.

[13]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[14]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[15]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[16]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[17]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[18]  Timothy M. Chan Approximating the diameter, width, smallest enclosing cylinder, and minimum-width annulus , 2000, SCG '00.

[19]  S. Sathiya Keerthi,et al.  A fast iterative nearest point algorithm for support vector machine classifier design , 2000, IEEE Trans. Neural Networks Learn. Syst..

[20]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[21]  Massimiliano Pontil,et al.  Face Detection in Still Gray Images , 2000 .

[22]  Padhraic Smyth,et al.  Towards scalable support vector machines using squashing , 2000, KDD '00.

[23]  B. Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, ICML.

[24]  Jianchang Mao,et al.  Scaling-up support vector machines using boosting algorithm , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[25]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[26]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[27]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[28]  David R. Musicant,et al.  Lagrangian Support Vector Machines , 2001, J. Mach. Learn. Res..

[29]  Anton Schwaighofer,et al.  The Bayesian Committee Support Vector Machine , 2001, ICANN.

[30]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[31]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[32]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[33]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[34]  Bernhard Schölkopf,et al.  Sampling Techniques for Kernel Methods , 2001, NIPS.

[35]  Samy Bengio,et al.  A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[36]  Piotr Indyk,et al.  Approximate clustering via core-sets , 2002, STOC '02.

[37]  Glenn Fung,et al.  Incremental Support Vector Machine Classification , 2002, SDM.

[38]  Jason Weston,et al.  Dealing with large diagonals in kernel matrices , 2003 .

[39]  Danny Roobaert DirectSVM: A Simple Support Vector Machine Perceptron , 2002, J. VLSI Signal Process..

[40]  Jiawei Han,et al.  Classifying large data sets using SVMs with hierarchical clusters , 2003, KDD '03.

[41]  Chih-Jen Lin,et al.  Decomposition Methods for Linear Support Vector Machines , 2003, Neural Computation.

[42]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[43]  Tom Fawcett,et al.  Proceedings, Twentieth International Conference on Machine Learning , 2003 .

[44]  Sariel Har-Peled,et al.  Shape fitting with outliers , 2003, SCG '03.

[45]  Glenn Fung,et al.  Finite Newton method for Lagrangian support vector machine classification , 2003, Neurocomputing.

[46]  Joseph S. B. Mitchell,et al.  Approximate minimum enclosing balls in high dimensions using core-sets , 2003, ACM J. Exp. Algorithmics.

[47]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[48]  Daniel Boley,et al.  Training Support Vector Machines Using Adaptive Clustering , 2004, SDM.

[49]  Larry S. Davis,et al.  Efficient Kernel Machines Using the Improved Fast Gauss Transform , 2004, NIPS.

[50]  Jason Weston,et al.  Breaking SVM Complexity with Cross-Training , 2004, NIPS.

[51]  Volker Tresp,et al.  Scaling Kernel-Based Systems to Large Data Sets , 2001, Data Mining and Knowledge Discovery.

[52]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[53]  J.T. Kwok,et al.  Scaling up support vector data description by using core-sets , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[54]  Frank Nielsen,et al.  Approximating Smallest Enclosing Balls , 2004, ICCSA.

[55]  Ivor W. Tsang,et al.  Very Large SVM Training using Core Vector Machines , 2005, AISTATS.