Optimizing Sparse Kernel Ridge Regression hyperparameters based on leave-one-out cross-validation

Kernel ridge regression (KRR) is a nonlinear extension of the ridge regression. The performance of the KRR depends on its hyperparameters such as a penalty factor C, and RBF kernel parameter sigma. We employ a method called MCV-KRR which optimizes the KRR hyperparameters so that a cross-validation error is minimized. This method becomes equivalent to a predictive approach to Gaussian process. Since the cost of KRR training is O(N3) where N is a data size, to reduce this complexity, some sparse approximation of the KRR is recently studied. In this paper, we apply the minimum cross-validation (MCV) approach to such sparse approximation. Our experiments show the MCV with the sparse approximation of the KRR can achieve almost the same generalization performance as the MCV-KRR with much lower cost.

[1]  Gavin C. Cawley,et al.  Reduced Rank Kernel Ridge Regression , 2002, Neural Processing Letters.

[2]  Wei Chu,et al.  A matching pursuit approach to sparse Gaussian process regression , 2005, NIPS.

[3]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[4]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[5]  Johan A. K. Suykens,et al.  Sparse approximation using least squares support vector machines , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[6]  Gavin C. Cawley,et al.  Fast exact leave-one-out cross-validation of sparse least-squares support vector machines , 2004, Neural Networks.

[7]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  S. Sundararajan,et al.  Predictive Approaches for Choosing Hyperparameters in Gaussian Processes , 1999, Neural Computation.

[10]  Kazumi Saito,et al.  Discovery of Relevant Weights by Minimizing Cross-Validation Error , 2000, PAKDD.

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[13]  Masayuki Karasuyama,et al.  Optimizing SVR Hyperparameters via Fast Cross-Validation using AOSVR , 2007, 2007 International Joint Conference on Neural Networks.

[14]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .