论文信息 - Optimizing Sparse Kernel Ridge Regression hyperparameters based on leave-one-out cross-validation

Optimizing Sparse Kernel Ridge Regression hyperparameters based on leave-one-out cross-validation

Kernel ridge regression (KRR) is a nonlinear extension of the ridge regression. The performance of the KRR depends on its hyperparameters such as a penalty factor C, and RBF kernel parameter sigma. We employ a method called MCV-KRR which optimizes the KRR hyperparameters so that a cross-validation error is minimized. This method becomes equivalent to a predictive approach to Gaussian process. Since the cost of KRR training is O(N3) where N is a data size, to reduce this complexity, some sparse approximation of the KRR is recently studied. In this paper, we apply the minimum cross-validation (MCV) approach to such sparse approximation. Our experiments show the MCV with the sparse approximation of the KRR can achieve almost the same generalization performance as the MCV-KRR with much lower cost.

Masayuki Karasuyama | Ryohei Nakano | R. Nakano | Masayuki Karasuyama

[1] Gavin C. Cawley,et al. Reduced Rank Kernel Ridge Regression , 2002, Neural Processing Letters.

[2] Wei Chu,et al. A matching pursuit approach to sparse Gaussian process regression , 2005, NIPS.

[3] M. Stone. Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[4] Bernhard Schölkopf,et al. Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[5] Johan A. K. Suykens,et al. Sparse approximation using least squares support vector machines , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[6] Gavin C. Cawley,et al. Fast exact leave-one-out cross-validation of sparse least-squares support vector machines , 2004, Neural Networks.

[7] Alexander Gammerman,et al. Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[8] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9] S. Sundararajan,et al. Predictive Approaches for Choosing Hyperparameters in Gaussian Processes , 1999, Neural Computation.

[10] Kazumi Saito,et al. Discovery of Relevant Weights by Minimizing Cross-Validation Error , 2000, PAKDD.

[11] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[12] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[13] Masayuki Karasuyama,et al. Optimizing SVR Hyperparameters via Fast Cross-Validation using AOSVR , 2007, 2007 International Joint Conference on Neural Networks.

[14] Carl E. Rasmussen,et al. In Advances in Neural Information Processing Systems , 2011 .