Generalized Core Vector Machines

Kernel methods, such as the support vector machine (SVM), are often formulated as quadratic programming (QP) problems. However, given m training patterns, a naive implementation of the QP solver takes O(m 3) training time and at least O(m2) space. Hence, scaling up these QPs is a major stumbling block in applying kernel methods on very large data sets, and a replacement of the naive method for finding the QP solutions is highly desirable. Recently, by using approximation algorithms for the minimum enclosing ball (MEB) problem, we proposed the core vector machine (CVM) algorithm that is much faster and can handle much larger data sets than existing SVM implementations. However, the CVM can only be used with certain kernel functions and kernel methods. For example, the very popular support vector regression (SVR) cannot be used with the CVM. In this paper, we introduce the center-constrained MEB problem and subsequently extend the CVM algorithm. The generalized CVM algorithm can now be used with any linear/nonlinear kernel and can also be applied to kernel methods such as SVR and the ranking SVM. Moreover, like the original CVM, its asymptotic time complexity is again linear in m and its space complexity is independent of m. Experiments show that the generalized CVM has comparable performance with state-of-the-art SVM and SVR implementations, but is faster and produces fewer support vectors on very large data sets

[1]  David R. Musicant,et al.  Active set support vector regression , 2004, IEEE Transactions on Neural Networks.

[2]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[3]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[4]  J. Friedman Multivariate adaptive regression splines , 1990 .

[5]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[6]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[7]  Kenneth L. Clarkson,et al.  Optimal core-sets for balls , 2008, Comput. Geom..

[8]  Nello Cristianini,et al.  The Kernel-Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines , 1998, ICML.

[9]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[10]  Jianchang Mao,et al.  Scaling-up support vector machines using boosting algorithm , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[11]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[12]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[13]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[14]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[15]  Joseph S. B. Mitchell,et al.  Approximate minimum enclosing balls in high dimensions using core-sets , 2003, ACM J. Exp. Algorithmics.

[16]  Brian Wyvill,et al.  Introduction to Implicit Surfaces , 1997 .

[17]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[18]  Ivor W. Tsang,et al.  Very Large SVM Training using Core Vector Machines , 2005, AISTATS.

[19]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[20]  Norikazu Takahashi,et al.  Rigorous proof of termination of SMO algorithm for support vector Machines , 2005, IEEE Transactions on Neural Networks.

[21]  Sariel Har-Peled,et al.  Shape fitting with outliers , 2003, SCG '03.

[22]  Samy Bengio,et al.  A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[23]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[24]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[25]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[26]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[27]  Wei Chu,et al.  An improved conjugate gradient scheme to the solution of least squares SVM , 2005, IEEE Transactions on Neural Networks.

[28]  Volker Tresp,et al.  Scaling Kernel-Based Systems to Large Data Sets , 2001, Data Mining and Knowledge Discovery.

[29]  Jason Weston,et al.  Dealing with large diagonals in kernel matrices , 2003 .

[30]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Ivor W. Tsang,et al.  Core Vector Regression for very large regression problems , 2005, ICML.

[32]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[33]  Gunnar Rätsch,et al.  Large scale genomic sequence SVM classifiers , 2005, ICML.

[34]  Bernhard Schölkopf,et al.  Sampling Techniques for Kernel Methods , 2001, NIPS.

[35]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[36]  David R. Musicant,et al.  Lagrangian Support Vector Machines , 2001, J. Mach. Learn. Res..