Learning Bounds for Kernel Regression Using Effective Data Dimensionality

Kernel methods can embed finite-dimensional data into infinite-dimensional feature spaces. In spite of the large underlying feature dimensionality, kernel methods can achieve good generalization ability. This observation is often wrongly interpreted, and it has been used to argue that kernel learning can magically avoid the curse-of-dimensionality phenomenon encountered in statistical estimation problems. This letter shows that although using kernel representation, one can embed data into an infinite-dimensional feature space; the effective dimensionality of this embedding, which determines the learning complexity of the underlying kernel machine, is usually small. In particular, we introduce an algebraic definition of a scale-sensitive effective dimension associated with a kernel representation. Based on this quantity, we derive upper bounds on the generalization performance of some kernel regression methods. Moreover, we show that the resulting convergent rates are optimal under various circumstances.

[1]  S. Geer Empirical Processes in M-Estimation , 2000 .

[2]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[3]  Tong Zhang,et al.  Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[4]  Shahar Mendelson,et al.  On the Performance of Kernel Classes , 2003, J. Mach. Learn. Res..

[5]  Tong Zhang,et al.  Leave-One-Out Bounds for Kernel Methods , 2003, Neural Computation.

[6]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[7]  Bernhard Schölkopf,et al.  Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .

[8]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[9]  V. Yurinsky Sums and Gaussian Vectors , 1995 .

[10]  Shahar Mendelson,et al.  Improving the sample complexity using global data , 2002, IEEE Trans. Inf. Theory.

[11]  John Shawe-Taylor,et al.  Covering numbers for support vector machines , 1999, COLT '99.

[12]  Peter L. Bartlett,et al.  The importance of convexity in learning with squared loss , 1998, COLT '96.

[13]  Steven A. Orszag,et al.  CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[14]  Peter L. Bartlett,et al.  Localized Rademacher Complexities , 2002, COLT.

[15]  Tong Zhang,et al.  Effective Dimension and Generalization of Kernel Learning , 2002, NIPS.

[16]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .