Ranking and empirical minimization of U-statistics

The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one is "better," with minimum ranking risk. Since the natural estimates of the risk are of the form of a U-statistic, results of the theory of U-processes are required for investigating the consistency of empirical risk minimizers. We establish in particular a tail inequality for degenerate U-processes, and apply it for showing that fast rates of convergence may be achieved under specific noise assumptions, just like in classification. Convex risk minimization methods are also studied.

[1]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .

[2]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[4]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[5]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[6]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[7]  W. Stute Conditional $U$-Statistics , 1991 .

[8]  E. Giné,et al.  Limit Theorems for $U$-Processes , 1993 .

[9]  W. Stute Universally Consistent Conditional $U$-Statistics , 1994 .

[10]  E. Giné,et al.  U-processes indexed by Vapnik-Červonenkis classes of functions with applications to asymptotics and bootstrap of U-statistics with estimated parameters , 1994 .

[11]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[12]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[13]  M. Talagrand New concentration inequalities in product spaces , 1996 .

[14]  M. Ledoux On Talagrand's deviation inequalities for product measures , 1997 .

[15]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[16]  E. Giné,et al.  Decoupling: From Dependence to Independence , 1998 .

[17]  J. Zinn,et al.  Exponential and Moment Inequalities for U-Statistics , 2000, math/0003228.

[18]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[19]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[20]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[21]  Gábor Lugosi,et al.  Pattern Classification and Learning Theory , 2002 .

[22]  S. Smale,et al.  ESTIMATING THE APPROXIMATION ERROR IN LEARNING THEORY , 2003 .

[23]  P. Reynaud-Bouret,et al.  Exponential Inequalities, with Constants, for U-statistics of Order Two , 2003 .

[24]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[25]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[26]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[27]  L. Breiman Population theory for boosting ensembles , 2003 .

[28]  Dan Roth,et al.  Generalization Bounds for the Area Under the ROC Curve , 2005, J. Mach. Learn. Res..

[29]  S. Boucheron,et al.  Moment inequalities for functions of independent random variables , 2005, math/0503651.

[30]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[31]  Patrick Gallinari,et al.  Using RankBoost to compare retrieval systems , 2005, CIKM '05.

[32]  Massih-Reza Amini,et al.  Ranking with Unlabeled Data: A First Study , 2005 .

[33]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[34]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[35]  Cynthia Rudin,et al.  Ranking with a P-Norm Push , 2006, COLT.

[36]  Radoslaw Adamczak,et al.  Moment inequalities for U-statistics , 2006 .

[37]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[38]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.