A Data-dependent Generalisation Error Bound for the AUC

The optimisation of the Area Under the ROC Curve (AUC) has recently been proposed for learning ranking functions. However, the estimation of the AUC of a function on the true distribution of the examples based on its empirical value is still an open problem. In this paper, we present a data-dependent generalisation error bound for the AUC. This bound presents the advantage to be tight, but it also allows to draw practical conclusions on learning algorithms which optimise the AUC. In particular, we show that in the case of AUC, kernel function classes have strong generalisation guarantees provided that the weights of the functions are small, suggesting that regularisation procedures which tend to limit the norm of the weight vector may lead to better generalisation performance for algorithms which optimise the AUC.

[1]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[2]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[3]  G. Garrido Cantarero,et al.  [The area under the ROC curve]. , 1996, Medicina clinica.

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[8]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[9]  Massih-Reza Amini,et al.  The use of unlabeled data to improve supervised learning for text summarization , 2002, SIGIR '02.

[10]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[11]  Ron Meir,et al.  Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..

[12]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[13]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.

[14]  Bhavani Raskutti,et al.  Optimising area under the ROC curve using gradient descent , 2004, ICML.

[15]  Rich Caruana,et al.  Data mining in metric space: an empirical analysis of supervised learning performance criteria , 2004, ROCAI.

[16]  Alain Rakotomamonjy,et al.  Optimizing Area Under Roc Curve with SVMs , 2004, ROCAI.

[17]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[18]  Dan Roth,et al.  Generalization Bounds for the Area Under the ROC Curve , 2005, J. Mach. Learn. Res..

[19]  Massih-Reza Amini,et al.  Automatic Text Summarization Based on Word-Clusters and Ranking Algorithms , 2005, ECIR.

[20]  Gábor Lugosi,et al.  Ranking and Scoring Using Empirical Risk Minimization , 2005, COLT.