Reduced Support Vector Machines: A Statistical Theory

In dealing with large data sets, the reduced support vector machine (RSVM) was proposed for the practical objective to overcome some computational difficulties as well as to reduce the model complexity. In this paper, we study the RSVM from the viewpoint of sampling design, its robustness, and the spectral analysis of the reduced kernel. We consider the nonlinear separating surface as a mixture of kernels. Instead of a full model, the RSVM uses a reduced mixture with kernels sampled from certain candidate set. Our main results center on two major themes. One is the robustness of the random subset mixture model. The other is the spectral analysis of the reduced kernel. The robustness is judged by a few criteria as follows: 1) model variation measure; 2) model bias (deviation) between the reduced model and the full model; and 3) test power in distinguishing the reduced model from the full one. For the spectral analysis, we compare the eigenstructures of the full kernel matrix and the approximation kernel matrix. The approximation kernels are generated by uniform random subsets. The small discrepancies between them indicate that the approximation kernels can retain most of the relevant information for learning tasks in the full kernel. We focus on some statistical theory of the reduced set method mainly in the context of the RSVM. The use of a uniform random subset is not limited to the RSVM. This approach can act as a supplemental algorithm on top of a basic optimization algorithm, wherein the actual optimization takes place on the subset-approximated data. The statistical properties discussed in this paper are still valid

[1]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[2]  David R. Musicant,et al.  Large Scale Kernel Regression via Linear Programming , 2002, Machine Learning.

[3]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[4]  Johan A. K. Suykens,et al.  Multiclass least squares support vector machines , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[5]  David R. Musicant,et al.  Active set support vector regression , 2004, IEEE Transactions on Neural Networks.

[6]  Yuh-Jye Lee,et al.  epsilon-SSVR: A Smooth Support Vector Machine for epsilon-Insensitive Regression , 2005, IEEE Trans. Knowl. Data Eng..

[7]  Holger Dette,et al.  Minimax Optimal Designs for Nonparametric Regression — A Further Optimality Property of the Uniform Distribution , 2001 .

[8]  David R. Musicant,et al.  Robust Linear and Support Vector Regression , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Temple F. Smith Occam's razor , 1980, Nature.

[12]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[13]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[14]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[15]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[16]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[17]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[18]  Chih-Jen Lin,et al.  A study on reduced support vector machines , 2003, IEEE Trans. Neural Networks.

[19]  D. Wiens Designs for approximately linear regression: two optimality properties of uniform designs , 1991 .

[20]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[21]  Leon N. Cooper,et al.  Training Data Selection for Support Vector Machines , 2005, ICNC.

[22]  Sebastian Mika,et al.  Kernel Fisher Discriminants , 2003 .

[23]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[24]  Bernhard Schölkopf,et al.  A Direct Method for Building Sparse Kernel Learning Algorithms , 2006, J. Mach. Learn. Res..

[25]  Yuh-Jye Lee,et al.  Clustering Model Selection for Reduced Support Vector Machines , 2004, IDEAL.

[26]  Yuan-chin Ivan Chang,et al.  Data Visualization via Kernel Machines , 2008 .

[27]  Ker-Chau Li,et al.  Robust Regression Designs when the Design Space Consists of Finitely Many Points , 1984 .

[28]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[29]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[30]  D. Wiens Minimax Robust Designs and Weights for Approximately Specified Regression Models With Heteroscedastic Errors , 1998 .

[31]  Douglas P. Wiens,et al.  Minimax designs for approximately linear regression , 1992 .

[32]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[33]  Simon Haykin,et al.  Generalized support vector machines , 1999, ESANN.

[34]  Yong Zhang,et al.  Uniform Design: Theory and Application , 2000, Technometrics.

[35]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[36]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[37]  Ian Witten,et al.  Data Mining , 2000 .

[38]  Yuh-Jye Lee,et al.  Generating the Reduced Set by Systematic Sampling , 2004, IDEAL.

[39]  H. Müller,et al.  Kernels for Nonparametric Curve Estimation , 1985 .

[40]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[41]  Yuan Wang,et al.  Some Applications of Number-Theoretic Methods in Statistics , 1994 .

[42]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[43]  Holger Dette,et al.  Optimal designs for testing the functional form of a regression via nonparametric estimation techniques , 2001 .

[44]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[45]  David R. Musicant,et al.  Lagrangian Support Vector Machines , 2001, J. Mach. Learn. Res..

[46]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[47]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[48]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.