Bagging gradient-boosted trees for high precision, low variance ranking models

Recent studies have shown that boosting provides excellent predictive performance across a wide variety of tasks. In Learning-to-rank, boosted models such as RankBoost and LambdaMART have been shown to be among the best performing learning methods based on evaluations on public data sets. In this paper, we show how the combination of bagging as a variance reduction technique and boosting as a bias reduction technique can result in very high precision and low variance ranking models. We perform thousands of parameter tuning experiments for LambdaMART to achieve a high precision boosting model. Then we show that a bagged ensemble of such LambdaMART boosted models results in higher accuracy ranking models while also reducing variance as much as 50%. We report our results on three public learning-to-rank data sets using four metrics. Bagged LamdbaMART outperforms all previously reported results on ten of the twelve comparisons, and bagged LambdaMART outperforms non-bagged LambdaMART on all twelve comparisons. For example, wrapping bagging around LambdaMART increases NDCG@1 from 0.4137 to 0.4200 on the MQ2007 data set; the best prior results in the literature for this data set is 0.4134 by RankBoost.

[1]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[2]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[3]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[4]  Leo Breiman,et al.  Using Iterated Bagging to Debias Regressions , 2001, Machine Learning.

[5]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[6]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[7]  Hsuan-Tien Lin,et al.  An Ensemble Ranking Solution for the Yahoo ! Learning to Rank Challenge , 2010 .

[8]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[9]  Mingrui Wu,et al.  Gradient descent optimization of smoothed information retrieval metrics , 2010, Information Retrieval.

[10]  Tao Qin,et al.  FRank: a ranking method with fidelity loss , 2007, SIGIR.

[11]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[12]  Dmitry Yurievich Pavlov,et al.  BagBoo: a scalable hybrid bagging-the-boosting model , 2010, CIKM '10.

[13]  D. Sculley,et al.  Combined regression and ranking , 2010, KDD.

[14]  Jianfeng Gao,et al.  Ranking, Boosting, and Model Adaptation , 2008 .

[15]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[16]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[17]  J. Friedman Stochastic gradient boosting , 2002 .

[18]  Raymond J. Mooney,et al.  Combining Bias and Variance Reduction Techniques for Regression Trees , 2005, ECML.

[19]  Rich Caruana,et al.  Additive Groves of Regression Trees , 2007, ECML.

[20]  Maksims Volkovs,et al.  BoltzRank: learning to maximize expected ranking gain , 2009, ICML '09.

[21]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[22]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[23]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[25]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[26]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[29]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[30]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[31]  Giorgio Valentini,et al.  Low Bias Bagged Support Vector Machines , 2003, ICML.

[32]  Qiang Wu,et al.  McRank: Learning to Rank Using Multiple Classification and Gradient Boosting , 2007, NIPS.