Sparse least squares support vector training in the reduced empirical feature space

In this paper we discuss sparse least squares support vector machines (sparse LS SVMs) trained in the empirical feature space, which is spanned by the mapped training data. First, we show that the kernel associated with the empirical feature space gives the same value with that of the kernel associated with the feature space if one of the arguments of the kernels is mapped into the empirical feature space by the mapping function associated with the feature space. Using this fact, we show that training and testing of kernel-based methods can be done in the empirical feature space and that training of LS SVMs in the empirical feature space results in solving a set of linear equations. We then derive the sparse LS SVMs restricting the linearly independent training data in the empirical feature space by the Cholesky factorization. Support vectors correspond to the selected training data and they do not change even if the value of the margin parameter is changed. Thus for linear kernels, the number of support vectors is the number of input variables at most. By computer experiments we show that we can reduce the number of support vectors without deteriorating the generalization ability.

[1]  Enric Hernández,et al.  Indistinguishability relations in Dempster-Shafer theory of evidence , 2004, Int. J. Approx. Reason..

[2]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[3]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[4]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[5]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[6]  Sheng Chen,et al.  Sparse modeling using orthogonal forward regression with PRESS statistic and regularization , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[8]  Robert C. Spicer,et al.  Author's biography , 1993 .

[9]  Shigeo Abe,et al.  KPCA-based training of a kernel fuzzy classifier with ellipsoidal regions , 2004, Int. J. Approx. Reason..

[10]  James M. Keller,et al.  Will the real iris data please stand up? , 1999, IEEE Trans. Fuzzy Syst..

[11]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[12]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[13]  Weidong Zhang,et al.  Improved sparse least-squares support vector machine classifiers , 2006, Neurocomputing.

[14]  Gavin C. Cawley,et al.  Improved sparse least-squares support vector machines , 2002, Neurocomputing.

[15]  Shigeo Abe,et al.  Character recognition using fuzzy rules extracted from data , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[16]  Shigeo Abe,et al.  Input Layer Optimization of Neural Networks by Sensitivity Analysis and its Application to Recognition of Numerals , 1991 .

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[19]  M. Omair Ahmad,et al.  Optimizing the kernel in the empirical feature space , 2005, IEEE Transactions on Neural Networks.

[20]  G. Horvath,et al.  A sparse least squares support vector machine classifier , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).