论文信息 - Learning to Rank for Information Retrieval

Learning to Rank for Information Retrieval

Learning to rank for Information Retrieval (IR) is a task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance. Many IR problems are by nature ranking problems, and many IR technologies can be potentially enhanced by using learning-to-rank techniques. The objective of this tutorial is to give an introduction to this research direction. Specifically, the existing learning-to-rank algorithms are reviewed and categorized into three approaches: the pointwise, pairwise, and listwise approaches. The advantages and disadvantages with each approach are analyzed, and the relationships between the loss functions used in these approaches and IR evaluation measures are discussed. Then the empirical evaluations on typical learning-to-rank methods are shown, with the LETOR collection as a benchmark dataset, which seems to suggest that the listwise approach be the most effective one among all the approaches. After that, a statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities. At the end of the tutorial, we provide a summary and discuss potential future work on learning to rank.

Tie-Yan Liu

[1] Azadeh Shakery,et al. A probabilistic relevance propagation model for hypertext retrieval , 2006, CIKM '06.

[2] Tao Tao,et al. Regularized estimation of mixture models for robust pseudo-relevance feedback , 2006, SIGIR.

[3] M. E. Maron,et al. On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[4] Jaime G. Carbonell,et al. Fast learning of document ranking functions with the committee perceptron , 2008, WSDM '08.

[5] Amnon Shashua,et al. Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[6] Thorsten Joachims,et al. Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[7] Chiranjib Bhattacharyya,et al. Structured learning for non-smooth ranking losses , 2008, KDD.

[8] James Allan,et al. Evaluation over thousands of queries , 2008, SIGIR '08.

[9] Yoram Singer,et al. Learning to Order Things , 1997, NIPS.

[10] Fredric C. Gey,et al. Inferring probability of relevance using the method of logistic regression , 1994, SIGIR '94.

[11] Tao Qin,et al. A general approximation framework for direct optimization of information retrieval measures , 2010, Information Retrieval.

[12] Tao Qin,et al. Learning to rank relational objects and its application to web search , 2008, WWW.

[13] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[14] Garrison W. Cottrell,et al. Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[15] David C. Gibbon,et al. Support vector machines: relevance feedback and information retrieval , 2002, Inf. Process. Manag..

[16] Norbert Fuhr,et al. Optimum polynomial retrieval functions based on the probability ranking principle , 1989, TOIS.

[17] Tao Qin,et al. Learning to Search Web Pages with Query-Level Loss Functions , 2006 .

[18] Massih-Reza Amini,et al. Generalization error bounds for classifiers trained with interdependent data , 2005, NIPS.

[19] Qiang Yang,et al. Exploiting the hierarchical structure for link analysis , 2005, SIGIR '05.

[20] Chris Buckley,et al. OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[21] Ramesh Nallapati,et al. Discriminative models for information retrieval , 2004, SIGIR '04.

[22] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[23] G. Lugosi,et al. Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[24] Tao Qin,et al. Supervised rank aggregation , 2007, WWW '07.

[25] W. Bruce Croft,et al. A Markov random field model for term dependencies , 2005, SIGIR '05.

[26] R. Duncan Luce,et al. Individual Choice Behavior , 1959 .

[27] Wei Chu,et al. Preference learning with Gaussian processes , 2005, ICML.

[28] Harris Wu,et al. The effects of fitness functions on genetic programming-based ranking discovery forWeb search , 2004, J. Assoc. Inf. Sci. Technol..

[29] Glenn Fung,et al. Learning Rankings via Convex Hull Separation , 2005, NIPS.

[30] R. Plackett. The Analysis of Permutations , 1975 .

[31] Tie-Yan Liu,et al. Adapting ranking SVM to document retrieval , 2006, SIGIR.

[32] Qiang Wu,et al. McRank: Learning to Rank Using Multiple Classification and Gradient Boosting , 2007, NIPS.

[33] Susan T. Dumais,et al. Learning user interaction models for predicting web search result preferences , 2006, SIGIR.

[34] Mehryar Mohri,et al. Magnitude-preserving ranking algorithms , 2007, ICML '07.

[35] Hector Garcia-Molina,et al. Combating Web Spam with TrustRank , 2004, VLDB.

[36] Thomas Hofmann,et al. Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[37] Martin Szummer,et al. A Decision Theoretic Framework for Ranking using Implicit Feedback , 2008 .

[38] Wei Chu,et al. Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[39] Tao Qin,et al. How to Make LETOR More Useful and Reliable , 2008 .

[40] In-Ho Kang,et al. Query type classification for web document retrieval , 2003, SIGIR.

[41] John D. Lafferty,et al. A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[42] Colin Campbell,et al. Bayes Point Machines , 2001, J. Mach. Learn. Res..

[43] Andrew Trotman,et al. Learning to Rank , 2005, Information Retrieval.

[44] Tie-Yan Liu,et al. Directly optimizing evaluation measures in learning to rank , 2008, SIGIR.

[45] Hongyuan Zha,et al. A regression framework for learning ranking functions using relative relevance judgments , 2007, SIGIR.

[46] Azadeh Shakery,et al. Relevance Propagation for Topic Distillation UIUC TREC 2003 Web Track Experiments , 2003, TREC.

[47] Tao Qin,et al. A study of relevance propagation for web search , 2005, SIGIR '05.

[48] Wolfgang Nejdl,et al. MailRank: using ranking for spam detection , 2005, CIKM '05.

[49] Christopher J. C. Burges,et al. High accuracy retrieval with multiple nested ranker , 2006, SIGIR.

[50] Weiguo Fan,et al. Genetic Programming-Based Discovery of Ranking Functions for Effective Web Search , 2005, J. Manag. Inf. Syst..

[51] Tie-Yan Liu. Are Algorithms Directly Optimizing IR Measures Really Direct , 2008 .

[52] Ralf Herbrich,et al. Large margin rank boundaries for ordinal regression , 2000 .

[53] Thomas Hofmann,et al. Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[54] Javed A. Aslam,et al. Models for metasearch , 2001, SIGIR '01.

[55] Harris Wu,et al. The effects of fitness functions on genetic programming-based ranking discovery for Web search: Research Articles , 2004 .

[56] C. L. Mallows. NON-NULL RANKING MODELS. I , 1957 .

[57] Gerhard Widmer,et al. Prediction of Ordinal Classes Using Regression Trees , 2001, Fundam. Informaticae.

[58] Hongyuan Zha,et al. Query-level learning to rank using isotonic regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[59] Thomas S. Huang,et al. Classification Approach towards Banking and Sorting Problems , 2003, ECML.

[60] Pável Calado,et al. A combined component approach for finding collection-adapted ranking functions based on genetic programming , 2007, SIGIR.

[61] CHENGXIANG ZHAI,et al. A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[62] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[63] Weiguo Fan,et al. A generic ranking function discovery framework by genetic programming for information retrieval , 2004, Inf. Process. Manag..

[64] Tie-Yan Liu,et al. Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[65] Stephen E. Robertson,et al. SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[66] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[67] Tie-Yan Liu,et al. Generalization analysis of listwise learning-to-rank algorithms , 2009, ICML '09.

[68] Brian D. Davison,et al. Topical link analysis for web search , 2006, SIGIR.

[69] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.

[70] Iadh Ounis,et al. A study of parameter tuning for term frequency normalization , 2003, CIKM '03.