Efficient Learning of Sparse Ranking Functions

Algorithms for learning to rank can be inefficient when they employ risk functions that use structural information. We describe and analyze a learning algorithm that efficiently learns a ranking function using a domination loss. This loss is designed for problems in which we need to rank a small number of positive examples over a vast number of negative examples. In that context, we propose an efficient coordinate descent approach that scales linearly with the number of examples. We then present an extension that incorporates regularization, thus extending Vapnik’s notion of regularized empirical risk minimization to ranking learning. We also discuss an extension to the case of multi-value feedback. Experiments performed on several benchmark datasets and large-scale Google internal datasets demonstrate the effectiveness of the learning algorithm in constructing compact models while retaining the empirical performance accuracy.

[1]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[2]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[3]  Yoram Singer,et al.  Log-Linear Models for Label Ranking , 2003, NIPS.

[4]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[5]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[6]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[7]  Ruslan Salakhutdinov,et al.  Adaptive Overrelaxed Bound Optimization Methods , 2003, ICML.

[8]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[9]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[10]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[11]  David Grangier,et al.  A Discriminative Kernel-based Model to Rank Images from Text Queries , 2007 .

[12]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[13]  Samy Bengio,et al.  A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[15]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.