Rankboost+: an improvement to Rankboost

Rankboost is a well-known algorithm that iteratively creates and aggregates a collection of “weak rankers” to build an effective ranking procedure. Initial work on Rankboost proposed two variants. One variant, that we call Rb-d and which is designed for the scenario where all weak rankers have the binary range $$\{0,1\}$$, has good theoretical properties, but does not perform well in practice. The other, that we call Rb-c, has good empirical behavior and is the recommended variation for this binary weak ranker scenario but lacks a theoretical grounding. In this paper, we rectify this situation by proposing an improved Rankboost algorithm for the binary weak ranker scenario that we call Rankboost$$+$$. We prove that this approach is theoretically sound and also show empirically that it outperforms both Rankboost variants in practice. Further, the theory behind Rankboost$$+$$ helps us to explain why Rb-d may not perform well in practice, and why Rb-c is better behaved in the binary weak ranker scenario, as has been observed in prior work.

[1]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[2]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[3]  Rong Jin,et al.  Learning to Rank by Optimizing NDCG Measure , 2009, NIPS.

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[5]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[6]  Shivani Agarwal,et al.  Ranking Chemical Structures for Drug Discovery: A New Machine Learning Approach , 2010, J. Chem. Inf. Model..

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[8]  Alexander J. Smola,et al.  Maximum Margin Matrix Factorization for Collaborative Ranking , 2007 .

[9]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[10]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[11]  John Guiver,et al.  Bayesian inference for Plackett-Luce ranking models , 2009, ICML '09.

[12]  Tao Qin,et al.  Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[13]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[14]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[15]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[16]  Emine Yilmaz,et al.  Document selection methodologies for efficient and effective learning-to-rank , 2009, SIGIR.

[17]  Rabia Nuray-Turan,et al.  Automatic ranking of information retrieval systems using data fusion , 2006, Inf. Process. Manag..

[18]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[19]  Cynthia Rudin,et al.  Margin-Based Ranking Meets Boosting in the Middle , 2005, COLT.

[20]  Hongyuan Zha,et al.  Query-level learning to rank using isotonic regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[21]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[22]  James Fan,et al.  Learning to rank for robust question answering , 2012, CIKM.