Generalization Bounds for the Area Under the ROC Curve

We study generalization properties of the area under the ROC curve (AUC), a quantity that has been advocated as an evaluation criterion for the bipartite ranking problem.The AUC is a different term than the error rate used for evaluation in classification problems; consequently, existing generalization bounds for the classification error rate cannot be used to draw conclusions about the AUC.In this paper, we define the expected accuracy of a ranking function (analogous to the expected error rate of a classification function), and derive distribution-free probabilistic bounds on the deviation of the empirical AUC of a ranking function (observed on a finite data sequence) from its expected accuracy.We derive both a large deviation bound, which serves to bound the expected accuracy of a ranking function in terms of its empirical AUC on a test sequence, and a uniform convergence bound, which serves to bound the expected accuracy of a learned ranking function in terms of its empirical AUC on a training sequence.Our uniform convergence bound is expressed in terms of a new set of combinatorial parameters that we term the bipartite rank-shatter coefficients; these play the same role in our result as do the standard VC-dimension related shatter coefficients (also known as the growth function) in uniform convergence results for the classification error rate. A comparison of our result with a recent uniform convergence result derived by Freund et al. (2003) for a quantity closely related to the AUC shows that the bound provided by our result can be considerably tighter.

[1]  R. Buck Partition of Space , 1943 .

[2]  van D. Dantzig On the consistency and the power of wilcoxon's two sample test : (proceedings knaw series a, _5_4(1951), nr 1, indagationes mathematicae, _1_3(1951), p 1-8) , 1951 .

[3]  J. Davenport Editor , 1960 .

[4]  J. Rustagi Bounds for the variance of Mann-Whitney statistic , 1961 .

[5]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[6]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[7]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[8]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[9]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[10]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[11]  J. Siemons Surveys in combinatorics, 1989 , 1989 .

[12]  Thomas G. Dietterich,et al.  In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.

[13]  L. Devroye Exponential Inequalities in Nonparametric Estimation , 1991 .

[14]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[15]  G. Garrido Cantarero,et al.  [The area under the ROC curve]. , 1996, Medicina clinica.

[16]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[17]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[18]  E. Giné,et al.  Decoupling: From Dependence to Independence , 1998 .

[19]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .

[20]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[21]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[22]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[23]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[24]  R. Herbrich,et al.  Average Precision and the Problem of Generalisation , 2002 .

[25]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[26]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[27]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[28]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.

[29]  Saharon Rosset,et al.  Model selection via the AUC , 2004, ICML.

[30]  Mehryar Mohri,et al.  Confidence Intervals for the Area Under the ROC Curve , 2004, NIPS.

[31]  Dan Roth,et al.  A Large Deviation Bound for the Area Under the ROC Curve , 2004, NIPS.

[32]  Dan Roth,et al.  A Uniform Convergence Bound for the Area Under the ROC Curve , 2005, AISTATS.

[33]  John Shawe-Taylor,et al.  PAC-Bayesian Compression Bounds on the Prediction Error of Learning Algorithms for Classification , 2005, Machine Learning.

[34]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[35]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .