The authors are to be commended for jumping in to describe support vector machines (SVMs), not an easy thing to do since the the literature for SVMs has grown at least exponentially in the last few years. A Google search for "support vector machines" gave "about 1,180,000" hits as of this writing. The authors have nevertheless made a nice selection of important points to emphasize. As noted, SVMs were proposed for classification in the early 1990s by arguments like those behind Figure 1 in their paper. The use of SVMs grew rapidly among computer scientists, as it was found that they worked very well in all kinds of prac tical applications. The theoretical underpinnings that went with the original proposals were different than those in the classical statistical literature, for example, those related to Bayes risk, and so had less impact in the statistical literature. The convergence of SVMs and regularization methods (or, rather the convergence of the "SVM community" and the "regularization com munity") was a major impetus in the study of the (clas sical) statistical properties of the SVM. One point at which this convergence took place was at an Amer ican Mathematical Society meeting at Mt. Holyoke in 1996. The speaker was describing the SVM with the so-called kernel trick when an anonymous person at the back of the room remarked that the SVM with the ker nel trick was the solution to an optimization problem in a reproducing kernel Hubert space (RKHS). Once it was clear to statisticians that the SVM can be obtained as the result of an optimization/regularization problem in a RKHS, tools known to statisticians in this context were rapidly employed to show how the SVM could be modified to take into account nonrepresentative sam ple sizes, unequal misclassification costs and more than two classes, and to show in each case that it directly tar gets the Bayes risk under very general circumstances (see also [5, 8]). Thus, a "classical" explanation of why they work so well was provided.
[1]
Xiaodong Lin,et al.
Gene expression Gene selection using support vector machines with non-convex penalty
,
2005
.
[2]
Grace Wahba.
ESTIMATING DERIVATIVES FROM OUTER SPACE.
,
1969
.
[3]
B. Yandell,et al.
Automatic Smoothing of Regression Functions in Generalized Linear Models
,
1986
.
[4]
Steve R. Gunn,et al.
Structural Modelling with Sparse Kernels
,
2002,
Machine Learning.
[5]
Stephen J. Wright,et al.
Framework for kernel regularization with application to protein clustering.
,
2005,
Proceedings of the National Academy of Sciences of the United States of America.
[6]
Yi Lin,et al.
Support Vector Machines and the Bayes Rule in Classification
,
2002,
Data Mining and Knowledge Discovery.
[7]
P. Halmos.
Introduction to Hilbert Space: And the Theory of Spectral Multiplicity
,
1998
.
[8]
N. Aronszajn.
Theory of Reproducing Kernels.
,
1950
.
[9]
Yi Lin.
A note on margin-based loss functions in classification
,
2004
.
[10]
Hao Helen Zhang.
Variable selection for support vector machines via smoothing spline anova
,
2006
.
[11]
J. S. Marron,et al.
Distance-Weighted Discrimination
,
2007
.
[12]
G. Wahba,et al.
Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data
,
2004
.
[13]
W. Wong,et al.
On ψ-Learning
,
2003
.