Model selection of polynomial kernel regression

Polynomial kernel regression is one of the standard and state-of-the-art learning strategies. However, as is well known, the choices of the degree of polynomial kernel and the regularization parameter are still open in the realm of model selection. The first aim of this paper is to develop a strategy to select these parameters. On one hand, based on the worst-case learning rate analysis, we show that the regularization term in polynomial kernel regression is not necessary. In other words, the regularization parameter can decrease arbitrarily fast when the degree of the polynomial kernel is suitable tuned. On the other hand,taking account of the implementation of the algorithm, the regularization term is required. Summarily, the effect of the regularization term in polynomial kernel regression is only to circumvent the " ill-condition" of the kernel matrix. Based on this, the second purpose of this paper is to propose a new model selection strategy, and then design an efficient learning algorithm. Both theoretical and experimental analysis show that the new strategy outperforms the previous one. Theoretically, we prove that the new learning strategy is almost optimal if the regression function is smooth. Experimentally, it is shown that the new strategy can significantly reduce the computational burden without loss of generalization capability.

[1]  Yiming Ying,et al.  Learning Rates of Least-Square Regularized Regression , 2006, Found. Comput. Math..

[2]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[3]  M. Urner Scattered Data Approximation , 2016 .

[4]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[5]  Zongben Xu,et al.  Estimation of learning rate of least square algorithm via Jackson operator , 2011, Neurocomputing.

[6]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[7]  Yuhong Yang,et al.  Minimax Nonparametric Classification — Part II : Model Selection for Adaptation , 1998 .

[8]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[9]  Cheng Soon Ong,et al.  Machine learning using hyperkernels , 2003, ICML 2003.

[10]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[11]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[12]  Hrushikesh Narhar Mhaskar,et al.  Spherical Marcinkiewicz-Zygmund inequalities and positive quadrature , 2001, Math. Comput..

[13]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[14]  Yves Grandvalet,et al.  Composite kernel learning , 2008, ICML '08.

[15]  Yuan Xu,et al.  Localized Polynomial Frames on the Ball , 2006 .

[16]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[17]  Yuan Xu,et al.  ORTHOGONAL POLYNOMIALS AND CUBATURE FORMULAE ON SPHERES AND ON BALLS , 1998 .

[18]  K. S. Banerjee Generalized Inverse of Matrices and Its Applications , 1973 .

[19]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[20]  Dao-Hong Xiang,et al.  Classification with Gaussians and Convex Loss , 2009, J. Mach. Learn. Res..

[21]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[22]  Pencho Petrushev,et al.  Localized Tight Frames on Spheres , 2006, SIAM J. Math. Anal..

[23]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[24]  A. Barron,et al.  Approximation and learning by greedy algorithms , 2008, 0803.1718.

[25]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[26]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint: On the bias–variance problem , 2007 .

[27]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[28]  Ronald A. DeVore,et al.  Approximation Methods for Supervised Learning , 2006, Found. Comput. Math..

[29]  Ingo Steinwart,et al.  Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[30]  George G. Lorentz,et al.  Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[31]  Lizhong Peng,et al.  Learning rates for regularized classifiers using multivariate polynomial kernels , 2008, J. Complex..

[32]  Ññøøøññøø Blockin Random Sampling of Multivariate Trigonometric Polynomials , 2004 .

[33]  Ingo Steinwart,et al.  Optimal learning rates for least squares SVMs using Gaussian kernels , 2011, NIPS.

[34]  S. Smale,et al.  Reproducing kernel hilbert spaces in learning theory , 2006 .

[35]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[36]  Mandava Rajeswari,et al.  A survey of the state of the art in learning the kernels , 2012, Knowledge and Information Systems.

[37]  Lorenzo Rosasco,et al.  Model Selection for Regularized Least-Squares Algorithm in Learning Theory , 2005, Found. Comput. Math..

[38]  E. D. Vito,et al.  Fast Rates for Regularized Least-squares Algorithm , 2005 .

[39]  Kurt Jetter,et al.  Approximation with polynomial kernels and SVM classifiers , 2006, Adv. Comput. Math..

[40]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[41]  Don R. Hush,et al.  Optimal Rates for Regularized Least Squares Regression , 2009, COLT.