Training Multiclass Classifiers by Maximizing the Volume Under the ROC Surface

Receiver operating characteristic (ROC) curves are a plot of a ranking classifier's true-positive rate versus its false-positive rate, as one varies the threshold between positive and negative classifications across the continuum. The area under the ROC curve offer a measure of the discriminatory power of machine learning algorithms that is independent of class distribution, via its equivalence to Mann-Whitney U-statistics. This measure has recently been extended to cover problems of discriminating three and more classes. In this case, the area under the curve generalizes to the volume under the ROC surface. In this paper, we show how a multi-class classifier can be trained by directly maximizing the volume under the ROC surface. This is accomplished by first approximating the discrete U-statistic that is equivalent to the volume under the surface in a continuous manner, and then maximizing this approximation by gradient ascent.