Building Support Vector Machines with Reduced Classifier Complexity

Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions from the concept of support vectors; (2) it greedily finds a set of kernel basis functions of a specified maximum size (dmax) to approximate the SVM primal cost function well; (3) it is efficient and roughly scales as O(ndmax2) where n is the number of training examples; and, (4) the number of basis functions it requires to achieve an accuracy close to the SVM accuracy is usually far less than the number of SVM support vectors.

[1]  J. Meijerink,et al.  An iterative solution method for linear systems of which the coefficient matrix is a symmetric -matrix , 1977 .

[2]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[3]  B.D. Rao,et al.  Comparison of basis selection methods , 1996, Conference Record of The Thirtieth Asilomar Conference on Signals, Systems and Computers.

[4]  Bernhard Schölkopf,et al.  Improving the accuracy and speed of support vector learning machines , 1997, NIPS 1997.

[5]  Federico Girosi,et al.  Reducing the run-time complexity of Support Vector Machines , 1999 .

[6]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[7]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[8]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[9]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[10]  Joachim M. Buhmann,et al.  Feature selection for support vector machines , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[11]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[12]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[13]  B. Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, ICML.

[14]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[15]  Tom Downs,et al.  Exact Simplification of Support Vector Solutions , 2002, J. Mach. Learn. Res..

[16]  G. Rätsch Robust Boosting via Convex Optimization , 2001 .

[17]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[18]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[19]  Kristin P. Bennett,et al.  MARK: a boosting algorithm for heterogeneous kernel models , 2002, KDD.

[20]  Olvi L. Mangasarian,et al.  A finite newton method for classification , 2002, Optim. Methods Softw..

[21]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[22]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[23]  Gert Cauwenberghs,et al.  SVM incremental learning, adaptation and optimization , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[24]  Aníbal R. Figueiras-Vidal,et al.  Growing support vector classifiers with controlled complexity , 2003, Pattern Recognit..

[25]  Chih-Jen Lin,et al.  A study on reduced support vector machines , 2003, IEEE Trans. Neural Networks.

[26]  Kate Smith-Miles,et al.  Automatic parameter selection for polynomial kernel , 2003, Proceedings Fifth IEEE Workshop on Mobile Computing Systems and Applications.

[27]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds , 2003, NIPS.

[28]  Glenn Fung,et al.  Finite Newton method for Lagrangian support vector machine classification , 2003, Neurocomputing.

[29]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[30]  Frank Weber,et al.  Optimal Reduced-Set Vectors for Support Vector Machines with a Quadratic Kernel , 2004, Neural Computation.

[31]  Jinbo Bi,et al.  Column-generation boosting methods for mixture of kernels , 2004, KDD.

[32]  Robert Sabourin,et al.  Speeding up the decision making of support vector classifiers , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[33]  Theodore B. Trafalis,et al.  An Analytic Center Machine , 2002, Machine Learning.

[34]  M. Seeger Low Rank Updates for the Cholesky Decomposition , 2004 .

[35]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[36]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[37]  Wei Chu,et al.  A matching pursuit approach to sparse Gaussian process regression , 2005, NIPS.

[38]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[39]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[40]  Bernhard Schölkopf,et al.  Building Sparse Large Margin Classifiers , 2005, ICML.

[41]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.