论文信息 - Adaptive Kernel Methods Using the Balancing Principle

Adaptive Kernel Methods Using the Balancing Principle

The regularization parameter choice is a fundamental problem in Learning Theory since the performance of most supervised algorithms crucially depends on the choice of one or more of such parameters. In particular a main theoretical issue regards the amount of prior knowledge needed to choose the regularization parameter in order to obtain good learning rates. In this paper we present a parameter choice strategy, called the balancing principle, to choose the regularization parameter without knowledge of the regularity of the target function. Such a choice adaptively achieves the best error rate. Our main result applies to regularization algorithms in reproducing kernel Hilbert space with the square loss, though we also study how a similar principle can be used in other situations. As a straightforward corollary we can immediately derive adaptive parameter choices for various kernel methods recently studied. Numerical experiments with the proposed parameter choice rules are also presented.

[1] Ingo Steinwart,et al. Consistency and robustness of kernel-based regression in convex risk minimization , 2007, 0709.0626.

[2] Saburou Saitoh,et al. Theory of Reproducing Kernels and Its Applications , 1988 .

[3] P. Mathé. The Lepskii principle revisited , 2006 .

[4] Peter Mathé,et al. Regularization of some linear ill-posed problems with discretized random noisy data , 2006, Math. Comput..

[5] Lorenzo Rosasco,et al. On regularization algorithms in learning theory , 2007, J. Complex..

[6] Alexander J. Smola,et al. Learning with kernels , 1998 .

[7] Felipe Cucker,et al. Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics) , 2007 .

[8] H. Engl,et al. Regularization of Inverse Problems , 1996 .

[9] Sergei V. Pereverzyev,et al. On the Adaptive Selection of the Parameter in Regularization of Ill-Posed Problems , 2005, SIAM J. Numer. Anal..

[10] James V. Candy,et al. Adaptive and Learning Systems for Signal Processing, Communications, and Control , 2006 .

[11] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[12] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[13] P. Massart,et al. Risk bounds for model selection via penalization , 1999 .

[14] Sergei V. Pereverzev,et al. On adaptive inverse estimation of linear functionals in Hilbert scales , 2003 .

[15] V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[16] A. Caponnetto,et al. Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[17] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .

[18] S. Smale,et al. Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[19] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[20] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[21] Peter L. Bartlett,et al. Model Selection and Error Estimation , 2000, Machine Learning.

[22] Felipe Cucker,et al. On the mathematical foundations of learning , 2001 .

[23] O. Lepskii. On a Problem of Adaptive Estimation in Gaussian White Noise , 1991 .

[24] Wenxin Jiang. On weak base hypotheses and their implications for boosting regression and classification , 2002 .

[25] Lorenzo Rosasco,et al. Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[26] S. Dudoit,et al. Asymptotics of cross-validated risk estimation in estimator selection and performance assessment , 2005 .

[27] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[28] I. Pinelis,et al. Remarks on Inequalities for Large Deviation Probabilities , 1986 .

[29] Guillaume Lecué,et al. AGGREGATION OF PENALIZED EMPIRICAL RISK MINIMIZERS IN REGRESSION , 2008, 0810.5288.

[30] Barbara Moore,et al. Theory of networks for learning , 1990, Defense, Security, and Sensing.

[31] Ronald A. DeVore,et al. Approximation Methods for Supervised Learning , 2006, Found. Comput. Math..

[32] E. B. Andersen,et al. Information Science and Statistics , 1986 .

[33] A. Caponnetto. Optimal Rates for Regularization Operators in Learning Theory , 2006 .

[34] A. V. D. Vaart,et al. Oracle inequalities for multi-fold cross validation , 2006 .

[35] Ding-Xuan Zhou,et al. Learning Theory: An Approximation Theory Viewpoint , 2007 .

[36] Charles A. Micchelli,et al. Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[37] B. Yu,et al. Boosting with the L_2-Loss: Regression and Classification , 2001 .

[38] P. Bühlmann,et al. Boosting With the L2 Loss , 2003 .

[39] A. Tikhonov,et al. Use of the regularization method in non-linear problems , 1965 .

[40] Weifeng Liu,et al. Adaptive and Learning Systems for Signal Processing, Communication, and Control , 2010 .

[41] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[42] Y. Yao,et al. On Early Stopping in Gradient Descent Learning , 2007 .

[43] T Poggio,et al. Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[44] P. Mathé,et al. Geometry of linear ill-posed problems in variable Hilbert scales Inverse Problems 19 789-803 , 2003 .

[45] A. Tsybakov,et al. Optimal aggregation of classifiers in statistical learning , 2003 .

[46] Tomaso A. Poggio,et al. Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[47] Y. Yao,et al. Adaptation for Regularization Operators in Learning Theory , 2006 .

[48] A. Verri,et al. Spectral Methods for Regularization in Learning Theory , 2006 .

[49] Lorenzo Rosasco,et al. Elastic-net regularization in learning theory , 2008, J. Complex..

[50] Yiming Ying,et al. Learning Rates of Least-Square Regularized Regression , 2006, Found. Comput. Math..

[51] G. Lugosi,et al. Complexity regularization via localized random penalties , 2004, math/0410091.

[52] L. Rosasco. Regularization Approaches to Learning Theory , 2004 .

[53] A. Cichocki,et al. Kernel Principal Component Regression with EM Approach to Nonlinear Principal Components Extraction , 2001 .

[54] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[55] Steven A. Orszag,et al. CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[56] G. Wahba. Spline models for observational data , 1990 .