Leave-One-Out Bounds for Kernel Methods

In this article, we study leave-one-out style cross-validation bounds for kernel methods. The essential element in our analysis is a bound on the parameter estimation stability for regularized kernel formulations. Using this result, we derive bounds on expected leave-one-out cross-validation errors, which lead to expected generalization bounds for various kernel algorithms. In addition, we also obtain variance bounds for leave-oneout errors. We apply our analysis to some classification and regression problems and compare them with previous results.

[1]  Tong Zhang,et al.  On the Dual Formulation of Regularized Linear Systems with Convex Risks , 2002, Machine Learning.

[2]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[3]  Tong Zhang,et al.  A Leave-One-out Cross Validation Bound for Kernel Methods with Applications in Learning , 2001, COLT/EuroCOLT.

[4]  Dana Ron,et al.  Algorithmic Stability and Sanity-Check Bounds for Leave-one-Out Cross-Validation , 1997, COLT.

[5]  Partha Niyogi,et al.  Almost-everywhere Algorithmic Stability and Generalization Error , 2002, UAI.

[6]  S. Geer Empirical Processes in M-Estimation , 2000 .

[7]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[8]  N. Sivakumar,et al.  Stability results for scattered‐data interpolation on Euclidean spheres , 1998, Adv. Comput. Math..

[9]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[10]  P. Massart,et al.  Rates of convergence for minimum contrast estimators , 1993 .

[11]  Hrushikesh Narhar Mhaskar,et al.  Approximation properties of zonal function networks using scattered data on the sphere , 1999, Adv. Comput. Math..

[12]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[13]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[15]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[16]  Manfred K. Warmuth,et al.  Relative Expected Instantaneous Loss Bounds , 2000, J. Comput. Syst. Sci..

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[18]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[19]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[20]  Holger Wendland,et al.  Error Estimates for Interpolation by Compactly Supported Radial Basis Functions of Minimal Degree , 1998 .

[21]  B. Efron,et al.  The Jackknife Estimate of Variance , 1981 .

[22]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[23]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[24]  J. Steele An Efron-Stein inequality for nonsymmetric statistics , 1986 .

[25]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[26]  Tong Zhang,et al.  Generalization Performance of Some Learning Problems in Hilbert Functional Spaces , 2001, NIPS.

[27]  Thorsten Joachims,et al.  Estimating the Generalization Performance of an SVM Efficiently , 2000, ICML.