On Equivalence Relationships Between Classification and Ranking Algorithms

We demonstrate that there are machine learning algorithms that can achieve success for two separate tasks simultaneously, namely the tasks of classification and bipartite ranking. This means that advantages gained from solving one task can be carried over to the other task, such as the ability to obtain conditional density estimates, and an order-of-magnitude reduction in computational time for training the algorithm. It also means that some algorithms are robust to the choice of evaluation metric used; they can theoretically perform well when performance is measured either by a misclassification error or by a statistic of the ROC curve (such as the area under the curve). Specifically, we provide such an equivalence relationship between a generalization of Freund et al.'s RankBoost algorithm, called the "P-Norm Push," and a particular cost-sensitive classification algorithm that generalizes AdaBoost, which we call "P-Classification." We discuss and validate the potential benefits of this equivalence relationship, and perform controlled experiments to understand P-Classification's empirical performance. There is no established equivalence relationship for logistic regression and its ranking counterpart, so we introduce a logistic-regression-style algorithm that aims in between classification and ranking, and has promising experimental performance with respect to both tasks.

[1]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[2]  Eyke Hüllermeier,et al.  Bipartite Ranking through Minimization of Univariate Loss , 2011, ICML.

[3]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[4]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[6]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[7]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[8]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[9]  Cynthia Rudin,et al.  The Rate of Convergence of Adaboost , 2011, COLT.

[10]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[11]  Cynthia Rudin,et al.  The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List , 2009, J. Mach. Learn. Res..

[12]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[13]  Naoki Abe,et al.  Multi-class cost-sensitive boosting with p-norm loss functions , 2008, KDD.

[14]  David P. Helmbold,et al.  A geometric approach to leveraging weak learners , 1999, Theor. Comput. Sci..

[15]  Yoav Freund,et al.  Boosting: Foundations and Algorithms , 2012 .

[16]  Anonymous Author Robust Reductions from Ranking to Classification , 2006 .

[17]  L. Breiman Arcing the edge , 1997 .

[18]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[19]  Cynthia Rudin,et al.  Margin-based Ranking and an Equivalence between AdaBoost and RankBoost , 2009, J. Mach. Learn. Res..

[20]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[21]  David Mease,et al.  Boosted Classification Trees and Class Probability/Quantile Estimation , 2007, J. Mach. Learn. Res..

[22]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.