Tree-Based Ranking Methods

This paper investigates how recursive partitioning methods can be adapted to the bipartite ranking problem. In ranking, the pursued goal is global: based on past data, define an order on the whole input space X, so that positive instances take up the top ranks with maximum probability. The most natural way to order all instances consists of projecting the input data onto the real line through a real-valued scoring function s and use the natural order on R. The accuracy of the ordering induced by a candidate s is classically measured in terms of the ROC curve or the AUC. Here we discuss the design of tree-structured scoring functions obtained by recursively maximizing the AUC criterion. The connection with recursive piecewise linear approximation of the optimal ROC curve both in the L1-sense and in the Linfin-sense is highlighted. A novel tree-based algorithm for ranking, called TreeRank, is proposed. Consistency results and generalization bounds of functional nature are established for this ranking method, when considering either the L1 or Linfin distance. We also describe committee-based learning procedures using TreeRank as a ldquobase ranker,rdquo in order to overcome obvious drawbacks of such a top-down partitioning technique. Simulation results on artificial data are also displayed.

[1]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Harry L. Van Trees,et al.  Detection, Estimation, and Modulation Theory: Radar-Sonar Signal Processing and Gaussian Signals in Noise , 1992 .

[4]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[5]  Stéphan Clémençon,et al.  On Partitioning Rules for Bipartite Ranking , 2009, AISTATS.

[6]  Eric Eaton,et al.  Learning user preferences for sets of objects , 2006, ICML.

[7]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[8]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[9]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[10]  Peter A. Flach,et al.  Learning Decision Trees Using the Area Under the ROC Curve , 2002, ICML.

[11]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[12]  Gábor Lugosi,et al.  Pattern Classification and Learning Theory , 2002 .

[13]  Stéphan Clémençon,et al.  Ranking the Best Instances , 2006, J. Mach. Learn. Res..

[14]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[15]  G. Lugosi,et al.  Consistency of Data-driven Histogram Methods for Density Estimation and Classification , 1996 .

[16]  B. Turnbull,et al.  NONPARAMETRIC AND SEMIPARAMETRIC ESTIMATION OF THE RECEIVER OPERATING CHARACTERISTIC CURVE , 1996 .

[17]  Gábor Lugosi,et al.  Ranking and Scoring Using Empirical Risk Minimization , 2005, COLT.

[18]  Alain Rakotomamonjy,et al.  Optimizing Area Under Roc Curve with SVMs , 2004, ROCAI.

[19]  Jue Wang,et al.  An Effective Tree-Based Algorithm for Ordinal Regression , 2006, IEEE Intell. Informatics Bull..

[20]  Harry L. Van Trees,et al.  Detection, Estimation, and Modulation Theory, Part I , 1968 .

[21]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[22]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[23]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[24]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[25]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[26]  George G. Lorentz,et al.  Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[27]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.

[28]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[29]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[30]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[31]  G. Lugosi,et al.  Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[32]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[33]  Dan Roth,et al.  Generalization Bounds for the Area Under the ROC Curve , 2005, J. Mach. Learn. Res..