Adaptive Kernel Methods Using the Balancing Principle

The regularization parameter choice is a fundamental problem in Learning Theory since the performance of most supervised algorithms crucially depends on the choice of one or more of such parameters. In particular a main theoretical issue regards the amount of prior knowledge needed to choose the regularization parameter in order to obtain good learning rates. In this paper we present a parameter choice strategy, called the balancing principle, to choose the regularization parameter without knowledge of the regularity of the target function. Such a choice adaptively achieves the best error rate. Our main result applies to regularization algorithms in reproducing kernel Hilbert space with the square loss, though we also study how a similar principle can be used in other situations. As a straightforward corollary we can immediately derive adaptive parameter choices for various kernel methods recently studied. Numerical experiments with the proposed parameter choice rules are also presented.

[1]  Ingo Steinwart,et al.  Consistency and robustness of kernel-based regression in convex risk minimization , 2007, 0709.0626.

[2]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[3]  P. Mathé The Lepskii principle revisited , 2006 .

[4]  Peter Mathé,et al.  Regularization of some linear ill-posed problems with discretized random noisy data , 2006, Math. Comput..

[5]  Lorenzo Rosasco,et al.  On regularization algorithms in learning theory , 2007, J. Complex..

[6]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[7]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics) , 2007 .

[8]  H. Engl,et al.  Regularization of Inverse Problems , 1996 .

[9]  Sergei V. Pereverzyev,et al.  On the Adaptive Selection of the Parameter in Regularization of Ill-Posed Problems , 2005, SIAM J. Numer. Anal..

[10]  James V. Candy,et al.  Adaptive and Learning Systems for Signal Processing, Communications, and Control , 2006 .

[11]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[14]  Sergei V. Pereverzev,et al.  On adaptive inverse estimation of linear functionals in Hilbert scales , 2003 .

[15]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[16]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[17]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[18]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[19]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[20]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[21]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[22]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[23]  O. Lepskii On a Problem of Adaptive Estimation in Gaussian White Noise , 1991 .

[24]  Wenxin Jiang On weak base hypotheses and their implications for boosting regression and classification , 2002 .

[25]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[26]  S. Dudoit,et al.  Asymptotics of cross-validated risk estimation in estimator selection and performance assessment , 2005 .

[27]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[28]  I. Pinelis,et al.  Remarks on Inequalities for Large Deviation Probabilities , 1986 .

[29]  Guillaume Lecué,et al.  AGGREGATION OF PENALIZED EMPIRICAL RISK MINIMIZERS IN REGRESSION , 2008, 0810.5288.

[30]  Barbara Moore,et al.  Theory of networks for learning , 1990, Defense, Security, and Sensing.

[31]  Ronald A. DeVore,et al.  Approximation Methods for Supervised Learning , 2006, Found. Comput. Math..

[32]  E. B. Andersen,et al.  Information Science and Statistics , 1986 .

[33]  A. Caponnetto Optimal Rates for Regularization Operators in Learning Theory , 2006 .

[34]  A. V. D. Vaart,et al.  Oracle inequalities for multi-fold cross validation , 2006 .

[35]  Ding-Xuan Zhou,et al.  Learning Theory: An Approximation Theory Viewpoint , 2007 .

[36]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[37]  B. Yu,et al.  Boosting with the L_2-Loss: Regression and Classification , 2001 .

[38]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[39]  A. Tikhonov,et al.  Use of the regularization method in non-linear problems , 1965 .

[40]  Weifeng Liu,et al.  Adaptive and Learning Systems for Signal Processing, Communication, and Control , 2010 .

[41]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[42]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[43]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[44]  P. Mathé,et al.  Geometry of linear ill-posed problems in variable Hilbert scales Inverse Problems 19 789-803 , 2003 .

[45]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[46]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[47]  Y. Yao,et al.  Adaptation for Regularization Operators in Learning Theory , 2006 .

[48]  A. Verri,et al.  Spectral Methods for Regularization in Learning Theory , 2006 .

[49]  Lorenzo Rosasco,et al.  Elastic-net regularization in learning theory , 2008, J. Complex..

[50]  Yiming Ying,et al.  Learning Rates of Least-Square Regularized Regression , 2006, Found. Comput. Math..

[51]  G. Lugosi,et al.  Complexity regularization via localized random penalties , 2004, math/0410091.

[52]  L. Rosasco Regularization Approaches to Learning Theory , 2004 .

[53]  A. Cichocki,et al.  Kernel Principal Component Regression with EM Approach to Nonlinear Principal Components Extraction , 2001 .

[54]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[55]  Steven A. Orszag,et al.  CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[56]  G. Wahba Spline models for observational data , 1990 .