ROC confidence bands: an empirical evaluation

This paper is about constructing confidence bands around ROC curves. We first introduce to the machine learning community three band-generating methods from the medical field, and evaluate how well they perform. Such confidence bands represent the region where the "true" ROC curve is expected to reside, with the designated confidence level. To assess the containment of the bands we begin with a synthetic world where we know the true ROC curve---specifically, where the class-conditional model scores are normally distributed. The only method that attains reasonable containment out-of-the-box produces non-parametric, "fixed-width" bands (FWBs). Next we move to a context more appropriate for machine learning evaluations: bands that with a certain confidence level will bound the performance of the model on future data. We introduce a correction to account for the larger uncertainty, and the widened FWBs continue to have reasonable containment. Finally, we assess the bands on 10 relatively large benchmark data sets. We conclude by recommending these FWBs, noting that being non-parametric they are especially attractive for machine learning studies, where the score distributions (1) clearly are not normal, and (2) even for the same data set vary substantially from learning method to learning method.

[1]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[2]  C A Roe,et al.  Statistical Comparison of Two ROC-curve Estimates Obtained from Partially-paired Datasets , 1998, Medical decision making : an international journal of the Society for Medical Decision Making.

[3]  Foster J. Provost,et al.  Confidence Bands for Roc Curves , 2004, ROCAI.

[4]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[5]  Paul N. Bennett Using asymmetric distributions to improve text classifier probability estimates , 2003, SIGIR.

[6]  Zakkula Govindarajulu Distribution-free confidence bounds forP(X , 1968 .

[7]  S. Gracovetsky,et al.  Application of the Theory , 1988 .

[8]  R. Hilgers Distribution-Free Confidence Bounds for ROC Curves , 1991, Methods of Information in Medicine.

[9]  W. Hall,et al.  Confidence Bands for Receiver Operating Characteristic Curves , 1993, Medical decision making : an international journal of the Society for Medical Decision Making.

[10]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[11]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[12]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[13]  Thomas G. Dietterich,et al.  Bootstrap Methods for the Cost-Sensitive Evaluation of Classifiers , 2000, ICML.

[14]  H. Hotelling Applications of the Theory of Error to the Interpretation of Trends , 1929 .

[15]  Rob J Hyndman,et al.  Nonparametric confidence intervals for receiver operating characteristic curves , 2004 .

[16]  Eibe Frank,et al.  Logistic Model Trees , 2003, ECML.

[17]  G. Campbell,et al.  Advances in statistical methodology for the evaluation of diagnostic and laboratory tests. , 1994, Statistics in medicine.

[18]  K. Zou,et al.  Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. , 1997, Statistics in medicine.

[19]  Sofus A. Macskassy,et al.  Pointwise ROC Confidence Bounds: An Empirical Evaluation , 2005 .

[20]  Roger M. Stein Benchmarking default prediction models: pitfalls and remedies in model validation , 2007 .

[21]  Lisa A Weissfeld,et al.  Advances in statistical methodology and their application in critical care , 2004, Current opinion in critical care.

[22]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[23]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[24]  J. Coast,et al.  The rationing debate: Rationing within the NHS should be explicit: The case for , 1997 .

[25]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[26]  Gerda Claeskens,et al.  Empirical likelihood confidence regions for comparison distributions and roc curves , 2003 .

[27]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .