Pruning Support Vector Machines Without Altering Performances

Support vector machines (SV machines, SVMs) have many merits that distinguish themselves from many other machine-learning algorithms, such as the nonexistence of local minima, the possession of the largest distance from the separating hyperplane to the SVs, and a solid theoretical foundation. However, SVM training algorithms such as the efficient sequential minimal optimization (SMO) often produce many SVs. Some scholars have found that the kernel outputs are frequently of similar levels, which insinuate the redundancy of SVs. By analyzing the overlapped information of kernel outputs, a succinct separating-hyperplane-securing method for pruning the dispensable SVs based on crosswise propagation (CP) is systematically developed. The method also circumvents the problem of explicitly discerning SVs in feature space as the SVM formulation does. Experiments with the famous SMO-based software LibSVM reveal that all typical kernels with different parameters on the data sets contribute the dispensable SVs. Some 1% ~ 9% (in some scenarios, more than 50%) dispensable SVs are found. Furthermore, the experimental results also verify that the pruning method does not alter the SVMs' performances at all. As a corollary, this paper further contributes in theory a new lower upper bound on the number of SVs in the high-dimensional feature space.

[1]  S. Sathiya Keerthi,et al.  Building Support Vector Machines with Reduced Classifier Complexity , 2006, J. Mach. Learn. Res..

[2]  Jun Guo,et al.  An Efficient Method for Simplifying Decision Functions of Support Vector Machines , 2006, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[3]  Marijke F. Augusteijn,et al.  Neural network pruning and its effect on generalization, some experimental results , 1993, Neural Parallel Sci. Comput..

[4]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[5]  Massimiliano Pontil,et al.  Properties of Support Vector Machines , 1998, Neural Computation.

[6]  Bernhard Schölkopf,et al.  Improving the accuracy and speed of support vector learning machines , 1997, NIPS 1997.

[7]  Jian Yang,et al.  An architecture-adaptive neural network online control system , 2007, Neural Computing and Applications.

[8]  Federico Girosi,et al.  Reducing the run-time complexity of Support Vector Machines , 1999 .

[9]  Andries Petrus Engelbrecht,et al.  Variance analysis of sensitivity information for pruning multilayer feedforward neural networks , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[10]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[11]  R. A. Silverman,et al.  Introductory Real Analysis , 1972 .

[12]  Francis Eng Hock Tay,et al.  Support vector machine with adaptive parameters in financial time series forecasting , 2003, IEEE Trans. Neural Networks.

[13]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Andries Petrus Engelbrecht,et al.  A new pruning heuristic based on variance analysis of sensitivity information , 2001, IEEE Trans. Neural Networks.

[15]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[16]  Marimuthu Palaniswami,et al.  Incremental training of support vector machines , 2005, IEEE Transactions on Neural Networks.

[17]  Chih-Jen Lin,et al.  Asymptotic convergence of an SMO algorithm without any assumptions , 2002, IEEE Trans. Neural Networks.

[18]  I. Song,et al.  Working Set Selection Using Second Order Information for Training Svm, " Complexity-reduced Scheme for Feature Extraction with Linear Discriminant Analysis , 2022 .

[19]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[20]  A. Bojanczyk Complexity of Solving Linear Systems in Different Models of Computation , 1984 .

[21]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[22]  Chih-Jen Lin,et al.  A Study on SMO-Type Decomposition Methods for Support Vector Machines , 2006, IEEE Transactions on Neural Networks.

[23]  F. L. Bauer Elimination with weighted row combinations for solving linear equations and least squares problems , 1965 .

[24]  David R. Musicant,et al.  Active set support vector regression , 2004, IEEE Transactions on Neural Networks.

[25]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[26]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[28]  Faming Liang,et al.  An Effective Bayesian Neural Network Classifier with a Comparison Study to Support Vector Machine , 2003, Neural Computation.

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  Xun Liang Removal of hidden neurons in multilayer perceptrons by orthogonal projection and weight crosswise propagation , 2006, Neural Computing and Applications.

[31]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[32]  Xun Liang Method of digging tunnels into the error hypersurface , 1993, Neural Parallel Sci. Comput..