Optimising area under the ROC curve using gradient descent

This paper introduces RankOpt, a linear binary classifier which optimises the area under the ROC curve (the AUC). Unlike standard binary classifiers, RankOpt adopts the AUC statistic as its objective function, and optimises it directly using gradient descent. The problems with using the AUC statistic as an objective function are that it is non-differentiable, and of complexity O(n2) in the number of data observations. RankOpt uses a differentiable approximation to the AUC which is accurate, and computationally efficient, being of complexity O(n.) This enables the gradient descent to be performed in reasonable time. The performance of RankOpt is compared with a number of other linear binary classifiers, over a number of different classification problems. In almost all cases it is found that the performance of RankOpt is significantly better than the other classifiers tested.

[1]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[2]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[3]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[7]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[8]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[9]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[10]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[11]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[12]  Foster Provost,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .

[13]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.