Variance estimation for two-class and multi-class ROC analysis using operating point averaging

Receiver operating characteristic (ROC) analysis enables fine-tuning of a trained classifier to a desired performance trade-off situation. ROC estimated from a finite test set is, however, insufficient for the sake of classifier comparison as it neglects performance variances. This research presents a practical algorithm for variance estimation at individual operating points of ROC curves or surfaces. It generalizes the threshold averaging of Fawcett et.al. to arbitrary operating point definition including the weighting-based formulation used in multi-class ROC analysis. The statistical test comparing performance differences between operating points of the same curve is illustrated for two-class and multi-class ROC.