Adaptive constraint reduction for training support vector machines.

A support vector machine (SVM) determines whether a given observed pattern lies in a particular class. The decision is based on prior training of the SVM on a set of patterns with known classification, and training is achieved by solving a convex quadratic programming problem. Since there are typically a large number of training patterns, this can be expensive. In this work, we propose an adaptive constraint reduction primal-dual interior-point method for training a linear SVM with `1 penalty (hinge loss) for misclassification. We reduce the computational effort by assembling the normal equation matrix using only a well-chosen subset of patterns. Starting with a large portion of the patterns, our algorithm excludes more and more unnecessary patterns as the iteration proceeds. We extend our approach to training nonlinear SVMs through Gram matrix approximation methods. We demonstrate the effectiveness of the algorithm on a variety of standard test problems.

[1]  J. Gondzio,et al.  Exploiting separability in large-scale Support Vector Machine training , 2007 .

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Motoaki Kawanabe,et al.  Kernel-Based Nonlinear Blind Source Separation , 2003, Neural Computation.

[4]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[5]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[6]  Yinyu Ye,et al.  A “build-down” scheme for linear programming , 1990, Math. Program..

[7]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[8]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[9]  Tamás Terlaky,et al.  A logarithmic barrier cutting plane method for convex programming , 1995, Ann. Oper. Res..

[10]  Daniel Boley,et al.  Training Support Vector Machines Using Adaptive Clustering , 2004, SDM.

[11]  Yinyu Ye,et al.  A Build-up Interior-point Method for Linear Programming: Aane Scaling Form , 1991 .

[12]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[13]  Michael C. Ferris,et al.  Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[14]  Dick den Hertog,et al.  A build-up variant of the path-following method for LP , 1991 .

[15]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[16]  Bernhard Schölkopf,et al.  Introduction to support vector learning , 1999 .

[17]  Tat-Jun Chin,et al.  Incremental kernel SVD for face recognition with image sets , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[18]  V. Raykar,et al.  Fast Computation of Sums of Gaussians in High Dimensions , 2005 .

[19]  Stephen J. Wright,et al.  Object-oriented software for quadratic programming , 2003, TOMS.

[20]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[21]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[22]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[23]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[24]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[25]  Pierre-Antoine Absil,et al.  Constraint Reduction for Linear Programs with Many Inequality Constraints , 2006, SIAM J. Optim..

[26]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[27]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[29]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[31]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[32]  Jie Sun,et al.  An Analytic Center Based Column Generation Algorithm for Convex Quadratic Feasibility Problems , 1998, SIAM J. Optim..

[33]  E. M. Gertz,et al.  Support vector machine classifiers for large data sets. , 2006 .

[34]  Tan Yee Fan,et al.  A Tutorial on Support Vector Machine , 2009 .

[35]  Larry S. Davis,et al.  Efficient Kernel Machines Using the Improved Fast Gauss Transform , 2004, NIPS.

[36]  Gene H. Golub,et al.  Matrix computations , 1983 .

[37]  Stephen J. Wright Primal-Dual Interior-Point Methods , 1997, Other Titles in Applied Mathematics.

[38]  Yinyu Ye,et al.  A Potential Reduction Algorithm Allowing Column Generation , 1992, SIAM J. Optim..

[39]  Sanjay Mehrotra,et al.  On the Implementation of a Primal-Dual Interior Point Method , 1992, SIAM J. Optim..