论文信息 - Model Selection with the Covering Number of the Ball of RKHS

Model Selection with the Covering Number of the Ball of RKHS

Model selection in kernel methods is the problem of choosing an appropriate hypothesis space for kernel-based learning algorithms to avoid either underfitting or overfitting of the resulting hypothesis. One of main problems faced by model selection is how to control the sample complexity when designing the model selection criterion. In this paper, we take balls of reproducing kernel Hilbert spaces (RKHSs) as candidate hypothesis spaces and propose a novel model selection criterion via minimizing the empirical optimal error in the ball of RKHS and the covering number of the ball. By introducing the covering number to measure the capacity of the ball of RKHS, our criterion could directly control the sample complexity. Specifically, we first prove the relation between expected optimal error and empirical optimal error in the ball of RKHS. Using the relation as the theoretical foundation, we give the definition of our criterion. Then, by estimating the expectation of optimal empirical error and proving an upper bound of the covering number, we represent our criterion as a functional of the kernel matrix. An efficient algorithm is further developed for approximately calculating the functional so that the fast Fourier transform (FFT) can be applied to achieve a quasi-linear computational complexity. We also prove the consistency between the approximate criterion and the accurate one for large enough samples. Finally, we empirically evaluate the performance of our criterion and verify the consistency between the approximate and accurate criterion.

Shizhong Liao | Li-Zhong Ding

[1] N. Cristianini,et al. On Kernel-Target Alignment , 2001, NIPS.

[2] Davide Anguita,et al. In-Sample and Out-of-Sample Model Selection and Error Estimation for Support Vector Machines , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[3] Tu Bao Ho,et al. Kernel Matrix Evaluation , 2007, IJCAI.

[4] Gavin C. Cawley,et al. On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[5] Yuesheng Xu,et al. Approximation of high-dimensional kernel matrices by multilevel circulant matrices , 2010, J. Complex..

[6] Mehryar Mohri,et al. L2 Regularization for Learning Kernels , 2009, UAI.

[7] Ding-Xuan Zhou,et al. The covering number in learning theory , 2002, J. Complex..

[8] Gunnar Rätsch,et al. Soft Margins for AdaBoost , 2001, Machine Learning.

[9] V. V. Vasin. Relationship of several variational methods for the approximate solution of ill-posed problems , 1970 .

[10] Ding-Xuan Zhou,et al. Capacity of reproducing kernel spaces in learning theory , 2003, IEEE Transactions on Information Theory.

[11] Mehryar Mohri,et al. Two-Stage Learning Kernel Algorithms , 2010, ICML.