Efficient Hold-Out for Subset of Regressors

Hold-out and cross-validation are among the most useful methods for model selection and performance assessment of machine learning algorithms. In this paper, we present a computationally efficient algorithm for calculating the hold-out performance for sparse regularized least-squares (RLS) in case the method is already trained with the whole training set. The computational complexity of performing the holdout is O(|H|3 + |H|2n), where |H| is the size of the hold-out set and n is the number of basis vectors. The algorithm can thus be used to calculate various types of cross-validation estimates effectively. For example, when m is the number of training examples, the complexities of N-fold and leave-one-out cross-validations are O(m3/N2 + (m2n)/N) and O(mn), respectively. Further, since sparse RLS can be trained in O(mn2) time for several regularization parameter values in parallel, the fast holdout algorithm enables efficient selection of the optimal parameter value.

[1]  Tapio Salakoski,et al.  Fast n-Fold Cross-Validation for Regularized Least-Squares , 2006 .

[2]  Senjian An,et al.  Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression , 2007, Pattern Recognit..

[3]  Tapio Pahikkala,et al.  An efficient algorithm for learning to rank from preference graphs , 2009, Machine Learning.

[4]  Tapio Salakoski,et al.  Matrix representations, linear transformations, and kernels for disambiguation in natural language , 2009, Machine Learning.

[5]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[6]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[7]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[8]  Gavin C. Cawley,et al.  Fast exact leave-one-out cross-validation of sparse least-squares support vector machines , 2004, Neural Networks.

[9]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[10]  R. Rifkin,et al.  Notes on Regularized Least Squares , 2007 .

[11]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[12]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[13]  T. Salakoski,et al.  Learning to Rank with Pairwise Regularized Least-Squares , 2007 .

[14]  T. Salakoski,et al.  CRITICAL POINTS IN ASSESSING LEARNING PERFORMANCE VIA CROSS-VALIDATION , 2008 .

[15]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[16]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.