Listwise approach to learning to rank: theory and algorithm

This paper aims to conduct a study on the listwise approach to learning to rank. The listwise approach learns a ranking function by taking individual lists as instances and minimizing a loss function defined on the predicted list and the ground-truth list. Existing work on the approach mainly focused on the development of new algorithms; methods such as RankCosine and ListNet have been proposed and good performances by them have been observed. Unfortunately, the underlying theory was not sufficiently studied so far. To amend the problem, this paper proposes conducting theoretical analysis of learning to rank algorithms through investigations on the properties of the loss functions, including consistency, soundness, continuity, differentiability, convexity, and efficiency. A sufficient condition on consistency for ranking is given, which seems to be the first such result obtained in related research. The paper then conducts analysis on three loss functions: likelihood loss, cosine loss, and cross entropy loss. The latter two were used in RankCosine and ListNet. The use of the likelihood loss leads to the development of a new listwise method called ListMLE, whose loss function offers better properties, and also leads to better experimental results.

[1]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[2]  Ramesh Nallapati,et al.  Discriminative models for information retrieval , 2004, SIGIR '04.

[3]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[4]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[5]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[6]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[7]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[8]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[9]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[10]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[11]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[12]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[13]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[14]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[15]  Tao Qin,et al.  Query-level loss functions for information retrieval , 2008, Inf. Process. Manag..

[16]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[17]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[18]  Tong Zhang,et al.  Subset Ranking Using Regression , 2006, COLT.