Large scale training methods for linear RankRLS

RankRLS is a recently proposed state-of-the-art method for learning ranking functions by minimizing a pairwise ranking error. The method can be trained by solving a system of linear equations. In this work, we investigate the use of conjugate gradient and regularization by iteration for linear RankRLS training on very large and high dimensional, but sparse data sets. Such data is typically encountered for example in applications where natural language based data is used. We show that even though a pairwise loss function is optimized when training RankRLS, the computational complexity of the proposed methods, when learning from data with utility scores, is O(tms), where t is the required number of iterations, m the number of training examples and s the average number of non-zero features per example. In addition, the complexity of learning from pairwise preferences is O(tms+tl), where l is the number of observed preferences in the training set. In the experiments, it is further confirmed that restricting the number of conjugate gradient iterations has a regularizing effect and that the number of iterations that provides optimal results is, in practice, a small constant. Thus, the use of regularization by iteration, while providing similar performance as the more well-known Tikhonov regularization, results in a tremendous reduction in the computational cost of training and parameter selection.

[1]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[2]  Eyke Hüllermeier,et al.  Pairwise Preference Learning and Ranking , 2003, ECML.

[3]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[4]  Mehryar Mohri,et al.  Magnitude-preserving ranking algorithms , 2007, ICML '07.

[5]  T. Pahikkala Greedy RankRLS : a Linear Time Algorithm for Learning Sparse Ranking Models , 2010 .

[6]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[7]  Eyke Hüllermeier,et al.  Preference Learning , 2005, Künstliche Intell..

[8]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[9]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[10]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[11]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[12]  Tapio Salakoski,et al.  Graph Kernels versus Graph Representations : a Case Study in Parse Ranking , 2006 .

[13]  G. Wahba Spline models for observational data , 1990 .

[14]  Tapio Pahikkala,et al.  An efficient algorithm for learning to rank from preference graphs , 2009, Machine Learning.

[15]  Tapio Salakoski,et al.  Conditional Ranking on Relational Data , 2010, ECML/PKDD.

[16]  L. Landweber An iteration formula for Fredholm integral equations of the first kind , 1951 .

[17]  Tapio Salakoski,et al.  Learning intransitive reciprocal relations with kernel methods , 2010, Eur. J. Oper. Res..

[18]  Johan A. K. Suykens,et al.  Least squares support vector machine classifiers: a large scale algorithm , 1999 .

[19]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[20]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[21]  Per Christian Hansen,et al.  Regularization methods for large-scale problems , 1993 .

[22]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[23]  Bernard De Baets,et al.  ROC analysis in ordinal regression learning , 2008, Pattern Recognit. Lett..

[24]  L. Ljung,et al.  Overtraining, Regularization, and Searching for Minimum in Neural Networks , 1992 .

[25]  Wei Chu,et al.  An improved conjugate gradient scheme to the solution of least squares SVM , 2005, IEEE Transactions on Neural Networks.

[26]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[27]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[28]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  T. Salakoski,et al.  Learning to Rank with Pairwise Regularized Least-Squares , 2007 .

[31]  S. R. Searle,et al.  On Deriving the Inverse of a Sum of Matrices , 1981 .

[32]  Eyke Hüllermeier,et al.  A critical analysis of variants of the AUC , 2008, Machine Learning.

[33]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[34]  Lorenzo Rosasco,et al.  Spectral Algorithms for Supervised Learning , 2008, Neural Computation.