Learning the Kernel Function via Regularization

We study the problem of finding an optimal kernel from a prescribed convex set of kernels K for learning a real-valued function by regularization. We establish for a wide variety of regularization functionals that this leads to a convex optimization problem and, for square loss regularization, we characterize the solution of this problem. We show that, although K may be an uncountable set, the optimal kernel is always obtained as a convex combination of at most m+2 basic kernels, where m is the number of data examples. In particular, our results apply to learning the optimal radial kernel or the optimal dot product kernel.

[1]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[2]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[3]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[4]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[5]  J. Aubin Mathematical methods of game and economic theory , 1979 .

[6]  I. Olkin,et al.  Inequalities: Theory of Majorization and Its Applications , 1980 .

[7]  C. Bennett,et al.  Interpolation of operators , 1987 .

[8]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[9]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[10]  Nilanjan Ray,et al.  Pattern Recognition Letters , 1995 .

[11]  Massimiliano Pontil,et al.  Properties of Support Vector Machines , 1998, Neural Computation.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[14]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[15]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[16]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[17]  Adrian S. Lewis,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[18]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[19]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[20]  Mark Herbster,et al.  Learning Additive Models Online with Fast Evaluating Kernels , 2001, COLT/EuroCOLT.

[21]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[22]  Ding-Xuan Zhou,et al.  The covering number in learning theory , 2002, J. Complex..

[23]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[24]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[25]  Thore Graepel,et al.  Kernel Matrix Completion by Semidefinite Programming , 2002, ICANN.

[26]  Alexander J. Smola,et al.  Hyperkernels , 2002, NIPS.

[27]  S. Smale,et al.  ESTIMATING THE APPROXIMATION ERROR IN LEARNING THEORY , 2003 .

[28]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[29]  Hao Helen Zhang,et al.  Component selection and smoothing in smoothing spline analysis of variance models -- COSSO , 2003 .

[30]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[31]  Michael I. Jordan,et al.  Computing regularization paths for learning multiple kernels , 2004, NIPS.

[32]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[33]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[34]  Zhihua Zhang,et al.  Bayesian inference for transductive learning of kernel matrix using the Tanner-Wong data augmentation algorithm , 2004, ICML.

[35]  Tong Zhang,et al.  On the Dual Formulation of Regularized Linear Systems with Convex Risks , 2002, Machine Learning.

[36]  Charles A. Micchelli,et al.  A Function Representation for Learning in Banach Spaces , 2004, COLT.

[37]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[38]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[39]  Lorenzo Rosasco,et al.  Some Properties of Regularized Kernel Methods , 2004, J. Mach. Learn. Res..

[40]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[41]  Mark Herbster,et al.  Relative Loss Bounds and Polynomial-Time Predictions for the k-lms-net Algorithm , 2004, ALT.

[42]  Ingo Steinwart,et al.  Fast Rates for Support Vector Machines , 2005, COLT.

[43]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[44]  Charles A. Micchelli,et al.  Learning Convex Combinations of Continuously Parameterized Basic Kernels , 2005, COLT.

[45]  Sayan Mukherjee,et al.  Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[46]  Yiming Ying,et al.  Learnability of Gaussians with Flexible Variances , 2007, J. Mach. Learn. Res..

[47]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .