Gacv for support vector machines

We introduce the Generalized Approximate Cross Validation (GACV) for estimating tuning parameter(s) in SVMs. The GACV has as its target the choice of parameters which will minimize the Generalized Comparative Kullback-Leibler Distance (GCKL). The GCKL is seen to be an upper bound on the expected misclassifica-tion rate. Some modest simulation examples suggest how it might work in practice. The GACV is the sum of a term which is the observed (sample) GCKL plus a margin-like quantity. reproducing kernel Hilbert space (RKHS) GCKL GACV It is now common knowledge that the support vector machine (SVM) paradigm, which has proved highly successful in a number of classification studies, can be cast as a variational/regularization problem in a reproducing kernel Hilbert space In this note, which is a sequel to [Wahba, 1999b], we look at the SVM paradigm from the point of view of a regularization problem, which allows a comparison with penalized log likelihood methods, as well as the application of model selection and tuning approaches which have been used with those and other regularization-type algorithms to choose tuning parameters in nonparametric statistical models. We first note the connection between the SVM paradigm in RKHS and the (dual) mathematical programming problem traditional in SVM classification problems. We then review the Generalized Comparative Kullback-Leibler distance (GCKL) for the usual SVM paradigm, and observe that it is trivially a simple upper bound on the expected misclassification rate. Next we revisit the GACV (Generalized Approximate Cross Validation) as a proxy for the GCKL proposed by Wahba [1999b] and the argument that it is a reasonable estimate of the GCKL. We found that it is not necessary to do the randomization of the GACV in [Wahba, 1999b] , because it can be replaced by an equally justifiable approximation which is readily computed exactly, along with the SVM solution to the dual mathematical programming problem. This estimate turns out interestingly, but not surprisingly to be simply related to what several authors have identified as the (observed) VC dimension of the estimated SVM. Some preliminary simulations are suggestive of the fact that the minimizer of the GACV is in fact a reasonable estimate of the minimizer of the GCKL, although further simulation and theoretical studies are warranted. It is hoped that this preliminary work will lead to better understanding of "tuning" issues in the optimization of SVM's and related classifiers. reproducing kernel Let T be an index set, …