EXACT AND EFFICIENT LEAVE-PAIR-OUT CROSS-VALIDATION FOR RANKING RLS

In this paper, we introduce an efficient cross-validation algorithm for RankRLS, a kernel-based ranking algorithm. Cross-validation (CV) is one of the most useful methods for model selection and performance assessment of machine learning algorithms, especially when the number of labeled data is small. A natural way to measure the performance of ranking algorithms by CV is to hold each data point pair out from the training set at a time and measure the performance with the held out pair. This approach is known as leave-pair-out cross-validation (LPOCV). We present a computationally efficient algorithm for performing LPOCV for RankRLS. If RankRLS is already trained with the whole training set, the computational complexity of the algorithm isO(m). Further, if there are d outputs to be learned simultaneously, the computational complexity of performing LPOCV isO(md). An approximativeO(m) time LPOCV algorithm for RankRLS has been previously proposed, but our method is the first exact solution to this problem. We introduce a general framework for developing and analysing hold-out and cross-validation techniques for quadratically regularized kernel-based learning algorithms. The framework is constructed using a value regularization based variant of the representer theorem. We provide a simple proof for this variant using matrix calculus. Our cross-validation algorithm can be seen as an instance of this framework.

[1]  Senjian An,et al.  Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression , 2007, Pattern Recognit..

[2]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[3]  Mehryar Mohri,et al.  Magnitude-preserving ranking algorithms , 2007, ICML '07.

[4]  Mehryar Mohri,et al.  An Alternative Ranking Problem for Search Engines , 2007, WEA.

[5]  T. Salakoski,et al.  CRITICAL POINTS IN ASSESSING LEARNING PERFORMANCE VIA CROSS-VALIDATION , 2008 .

[6]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[7]  T. Poggio,et al.  Regularized Least-Squares Classification 133 In practice , although , 2007 .

[8]  Tapio Salakoski,et al.  Efficient AUC Maximization with Regularized Least-Squares , 2008, SCAI.

[9]  T. Salakoski,et al.  Learning to Rank with Pairwise Regularized Least-Squares , 2007 .

[10]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[11]  Simon Günter,et al.  Stratification bias in low signal microarray studies , 2007, BMC Bioinformatics.

[12]  Tapio Salakoski,et al.  Fast n-Fold Cross-Validation for Regularized Least-Squares , 2006 .

[13]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[14]  Ryan M. Rifkin,et al.  Value Regularization and Fenchel Duality , 2007, J. Mach. Learn. Res..

[15]  Tong Zhang,et al.  Graph-Based Semi-Supervised Learning and Spectral Kernel Design , 2008, IEEE Transactions on Information Theory.

[16]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[17]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[18]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[19]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.