Margin-based Ranking and an Equivalence between AdaBoost and RankBoost

We study boosting algorithms for learning to rank. We give a general margin-based bound for ranking based on covering numbers for the hypothesis space. Our bound suggests that algorithms that maximize the ranking margin will generalize well. We then describe a new algorithm, smooth margin ranking, that precisely converges to a maximum ranking-margin solution. The algorithm is a modification of RankBoost, analogous to "approximate coordinate ascent boosting." Finally, we prove that AdaBoost and RankBoost are equally good for the problems of bipartite ranking and classification in terms of their asymptotic behavior on the training set. Under natural conditions, AdaBoost achieves an area under the ROC curve that is equally as good as RankBoost's; furthermore, RankBoost, when given a specific intercept, achieves a misclassification error that is as good as AdaBoost's. This may help to explain the empirical observations made by Cortes and Mohri, and Caruana and Niculescu-Mizil, about the excellent performance of AdaBoost as a bipartite ranking algorithm, as measured by the area under the ROC curve.

[1]  Martin Gardner,et al.  The Colossal Book of Mathematics , 2001 .

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Yoram Singer,et al.  Log-Linear Models for Label Ranking , 2003, NIPS.

[4]  Stéphan Clémençon,et al.  Ranking the Best Instances , 2006, J. Mach. Learn. Res..

[5]  Cynthia Rudin,et al.  Margin-Based Ranking Meets Boosting in the Middle , 2005, COLT.

[6]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[7]  Mehryar Mohri,et al.  Confidence Intervals for the Area Under the ROC Curve , 2004, NIPS.

[8]  G. Lugosi,et al.  Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[9]  Yoram Singer,et al.  Efficient Learning of Label Ranking by Soft Projections onto Polyhedra , 2006, J. Mach. Learn. Res..

[10]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[11]  P. Gallinari,et al.  A Data-dependent Generalisation Error Bound for the AUC , 2005 .

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[13]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[14]  Gábor Lugosi,et al.  Ranking and Scoring Using Empirical Risk Minimization , 2005, COLT.

[15]  Ulf Brefeld,et al.  {AUC} maximizing support vector learning , 2005 .

[16]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[17]  Cynthia Rudin,et al.  The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List , 2009, J. Mach. Learn. Res..

[18]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[19]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[20]  Cynthia Rudin,et al.  Boosting Based on a Smooth Margin , 2004, COLT.

[21]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[22]  O. Bousquet New approaches to statistical learning theory , 2003 .

[23]  S. D. Pietra,et al.  Duality and Auxiliary Functions for Bregman Distances , 2001 .

[24]  Cynthia Rudin,et al.  The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins , 2004, J. Mach. Learn. Res..

[25]  R. Schapire,et al.  Analysis of boosting algorithms using the smooth margin function , 2007, 0803.4092.

[26]  Cynthia Rudin,et al.  Ranking with a P-Norm Push , 2006, COLT.

[27]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[28]  Alexander J. Smola,et al.  Direct Optimization of Ranking Measures , 2007, ArXiv.

[29]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[30]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[31]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[32]  Dan Roth,et al.  Generalization Bounds for the Area Under the ROC Curve , 2005, J. Mach. Learn. Res..

[33]  Tong Zhang,et al.  Statistical Analysis of Bayes Optimal Subset Ranking , 2008, IEEE Transactions on Information Theory.

[34]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[35]  Cynthia Rudin,et al.  Precise Statements of Convergence for AdaBoost and arc-gv , 2007 .