Learning to Rank for Information Retrieval

Learning to rank for Information Retrieval (IR) is a task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance. Many IR problems are by nature ranking problems, and many IR technologies can be potentially enhanced by using learning-to-rank techniques. The objective of this tutorial is to give an introduction to this research direction. Specifically, the existing learning-to-rank algorithms are reviewed and categorized into three approaches: the pointwise, pairwise, and listwise approaches. The advantages and disadvantages with each approach are analyzed, and the relationships between the loss functions used in these approaches and IR evaluation measures are discussed. Then the empirical evaluations on typical learning-to-rank methods are shown, with the LETOR collection as a benchmark dataset, which seems to suggest that the listwise approach be the most effective one among all the approaches. After that, a statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities. At the end of the tutorial, we provide a summary and discuss potential future work on learning to rank.

[1]  Azadeh Shakery,et al.  A probabilistic relevance propagation model for hypertext retrieval , 2006, CIKM '06.

[2]  Tao Tao,et al.  Regularized estimation of mixture models for robust pseudo-relevance feedback , 2006, SIGIR.

[3]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[4]  Jaime G. Carbonell,et al.  Fast learning of document ranking functions with the committee perceptron , 2008, WSDM '08.

[5]  Amnon Shashua,et al.  Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[6]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[7]  Chiranjib Bhattacharyya,et al.  Structured learning for non-smooth ranking losses , 2008, KDD.

[8]  James Allan,et al.  Evaluation over thousands of queries , 2008, SIGIR '08.

[9]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[10]  Fredric C. Gey,et al.  Inferring probability of relevance using the method of logistic regression , 1994, SIGIR '94.

[11]  Tao Qin,et al.  A general approximation framework for direct optimization of information retrieval measures , 2010, Information Retrieval.

[12]  Tao Qin,et al.  Learning to rank relational objects and its application to web search , 2008, WWW.

[13]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[14]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[15]  David C. Gibbon,et al.  Support vector machines: relevance feedback and information retrieval , 2002, Inf. Process. Manag..

[16]  Norbert Fuhr,et al.  Optimum polynomial retrieval functions based on the probability ranking principle , 1989, TOIS.

[17]  Tao Qin,et al.  Learning to Search Web Pages with Query-Level Loss Functions , 2006 .

[18]  Massih-Reza Amini,et al.  Generalization error bounds for classifiers trained with interdependent data , 2005, NIPS.

[19]  Qiang Yang,et al.  Exploiting the hierarchical structure for link analysis , 2005, SIGIR '05.

[20]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[21]  Ramesh Nallapati,et al.  Discriminative models for information retrieval , 2004, SIGIR '04.

[22]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[23]  G. Lugosi,et al.  Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[24]  Tao Qin,et al.  Supervised rank aggregation , 2007, WWW '07.

[25]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[26]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[27]  Wei Chu,et al.  Preference learning with Gaussian processes , 2005, ICML.

[28]  Harris Wu,et al.  The effects of fitness functions on genetic programming-based ranking discovery forWeb search , 2004, J. Assoc. Inf. Sci. Technol..

[29]  Glenn Fung,et al.  Learning Rankings via Convex Hull Separation , 2005, NIPS.

[30]  R. Plackett The Analysis of Permutations , 1975 .

[31]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[32]  Qiang Wu,et al.  McRank: Learning to Rank Using Multiple Classification and Gradient Boosting , 2007, NIPS.

[33]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.

[34]  Mehryar Mohri,et al.  Magnitude-preserving ranking algorithms , 2007, ICML '07.

[35]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[36]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[37]  Martin Szummer,et al.  A Decision Theoretic Framework for Ranking using Implicit Feedback , 2008 .

[38]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[39]  Tao Qin,et al.  How to Make LETOR More Useful and Reliable , 2008 .

[40]  In-Ho Kang,et al.  Query type classification for web document retrieval , 2003, SIGIR.

[41]  John D. Lafferty,et al.  A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[42]  Colin Campbell,et al.  Bayes Point Machines , 2001, J. Mach. Learn. Res..

[43]  Andrew Trotman,et al.  Learning to Rank , 2005, Information Retrieval.

[44]  Tie-Yan Liu,et al.  Directly optimizing evaluation measures in learning to rank , 2008, SIGIR.

[45]  Hongyuan Zha,et al.  A regression framework for learning ranking functions using relative relevance judgments , 2007, SIGIR.

[46]  Azadeh Shakery,et al.  Relevance Propagation for Topic Distillation UIUC TREC 2003 Web Track Experiments , 2003, TREC.

[47]  Tao Qin,et al.  A study of relevance propagation for web search , 2005, SIGIR '05.

[48]  Wolfgang Nejdl,et al.  MailRank: using ranking for spam detection , 2005, CIKM '05.

[49]  Christopher J. C. Burges,et al.  High accuracy retrieval with multiple nested ranker , 2006, SIGIR.

[50]  Weiguo Fan,et al.  Genetic Programming-Based Discovery of Ranking Functions for Effective Web Search , 2005, J. Manag. Inf. Syst..

[51]  Tie-Yan Liu Are Algorithms Directly Optimizing IR Measures Really Direct , 2008 .

[52]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[53]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[54]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[55]  Harris Wu,et al.  The effects of fitness functions on genetic programming-based ranking discovery for Web search: Research Articles , 2004 .

[56]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[57]  Gerhard Widmer,et al.  Prediction of Ordinal Classes Using Regression Trees , 2001, Fundam. Informaticae.

[58]  Hongyuan Zha,et al.  Query-level learning to rank using isotonic regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[59]  Thomas S. Huang,et al.  Classification Approach towards Banking and Sorting Problems , 2003, ECML.

[60]  Pável Calado,et al.  A combined component approach for finding collection-adapted ranking functions based on genetic programming , 2007, SIGIR.

[61]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[62]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[63]  Weiguo Fan,et al.  A generic ranking function discovery framework by genetic programming for information retrieval , 2004, Inf. Process. Manag..

[64]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[65]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[66]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[67]  Tie-Yan Liu,et al.  Generalization analysis of listwise learning-to-rank algorithms , 2009, ICML '09.

[68]  Brian D. Davison,et al.  Topical link analysis for web search , 2006, SIGIR.

[69]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[70]  Iadh Ounis,et al.  A study of parameter tuning for term frequency normalization , 2003, CIKM '03.

[71]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[72]  Michael Collins,et al.  Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[73]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[74]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[75]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[76]  Massih-Reza Amini,et al.  A boosting algorithm for learning bipartite ranking functions with partially labeled data , 2008, SIGIR '08.

[77]  S. Rajaram,et al.  Generalization Bounds for k-Partite Ranking , 2005 .

[78]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[79]  Weiguo Fan,et al.  Discovery of context-specific ranking functions for effective information retrieval using genetic programming , 2004, IEEE Transactions on Knowledge and Data Engineering.

[80]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[81]  Dan Roth,et al.  Generalization Bounds for the Area Under the ROC Curve , 2005, J. Mach. Learn. Res..

[82]  Yong Yu,et al.  Learning to rank with ties , 2008, SIGIR '08.

[83]  Stéphan Clémençon,et al.  Ranking the Best Instances , 2006, J. Mach. Learn. Res..

[84]  Filip Radlinski,et al.  Active exploration for learning rankings from clickthrough data , 2007, KDD '07.

[85]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[86]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[87]  Tao Qin,et al.  Ranking with multiple hyperplanes , 2007, SIGIR.

[88]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[89]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[90]  John Guiver,et al.  Learning to rank with SoftRank and Gaussian processes , 2008, SIGIR '08.

[91]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[92]  Stephen E. Robertson,et al.  Overview of the Okapi projects , 1997, J. Documentation.

[93]  Tong Zhang,et al.  Subset Ranking Using Regression , 2006, COLT.

[94]  Hang Li,et al.  Ranking refinement and its application to information retrieval , 2008, WWW.

[95]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[96]  Wagner Meira,et al.  Learning to rank at query-time using association rules , 2008, SIGIR '08.

[97]  Fernando Diaz,et al.  Regularizing query-based retrieval scores , 2007, Information Retrieval.

[98]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[99]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[100]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[101]  Thorsten Joachims,et al.  Evaluating Retrieval Performance Using Clickthrough Data , 2003, Text Mining.

[102]  Tao Qin,et al.  Query-level loss functions for information retrieval , 2008, Inf. Process. Manag..

[103]  Tao Qin,et al.  FRank: a ranking method with fidelity loss , 2007, SIGIR.

[104]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[105]  Fredric C. Gey,et al.  Probabilistic retrieval based on staged logistic regression , 1992, SIGIR '92.

[106]  David Hawking,et al.  Overview of the TREC 2003 Web Track , 2003, TREC.

[107]  Tao Tao,et al.  An exploration of proximity measures in information retrieval , 2007, SIGIR.

[108]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[109]  C. Burges,et al.  Learning to Rank Using Classification and Gradient Boosting , 2008 .

[110]  Stephen E. Robertson,et al.  Optimisation methods for ranking functions with multiple parameters , 2006, CIKM '06.

[111]  Hwanjo Yu,et al.  SVM selective sampling for ranking with application to data retrieval , 2005, KDD '05.

[112]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[113]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[114]  Ben Carterette,et al.  Learning a ranking from pairwise preferences , 2006, SIGIR '06.

[115]  Weiguo Fan,et al.  On linear mixture of expert approaches to information retrieval , 2006, Decis. Support Syst..

[116]  Tao Qin,et al.  Feature selection for ranking , 2007, SIGIR.

[117]  Tao Qin,et al.  Global Ranking Using Continuous Conditional Random Fields , 2008, NIPS.

[118]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[119]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[120]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[121]  Tao Qin,et al.  Query-level stability and generalization in learning to rank , 2008, ICML '08.

[122]  Hang Li,et al.  Cost-Sensitive Learning of SVM for Ranking , 2006, ECML.

[123]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[124]  Kevin Duh,et al.  Learning to rank with partially-labeled data , 2008, SIGIR '08.

[125]  Eric Brill,et al.  Learning effective ranking functions for newsgroup search , 2004, SIGIR '04.

[126]  Harry Shum,et al.  Query Dependent Ranking Using K-nearest Neighbor * , 2022 .

[127]  Yiming Yang,et al.  A Loss Function Analysis for Classification Methods in Text Categorization , 2003, ICML.

[128]  Edward F. Harrington,et al.  Online Ranking/Collaborative Filtering Using the Perceptron Algorithm , 2003, ICML.

[129]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[130]  Edward A. Fox,et al.  Ranking function optimization for effective Web search by genetic programming: an empirical study , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[131]  Stephen E. Robertson,et al.  On rank-based effectiveness measures and optimization , 2007, Information Retrieval.

[132]  W. Bruce Croft,et al.  Direct Maximization of Rank-Based Metrics for Information Retrieval , 2005 .

[133]  Jian-Yun Nie,et al.  Learning to Rank Documents for Ad-Hoc Retrieval with Regularized Models , 2007 .

[134]  Jianfeng Gao,et al.  Linear discriminant model for information retrieval , 2005, SIGIR '05.

[135]  Tapas Kanungo,et al.  Machine Learned Sentence Selection Strategies for Query-Biased Summarization , 2008 .

[136]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[137]  Min Zhao,et al.  Ranking definitions with supervised learning methods , 2005, WWW '05.

[138]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[139]  Silviu Guiasu,et al.  The principle of maximum entropy , 1985 .

[140]  Wei-Pang Yang,et al.  Learning to Rank for Information Retrieval Using Genetic Programming , 2007 .

[141]  Shivani Agarwal,et al.  Stability and Generalization of Bipartite Ranking Algorithms , 2005, COLT.