Confidence Bands for Roc Curves

We address the problem of comparing the performance of classifiers. In this paper we study techniques for generating and evaluating bands on ROC curves. Historically this has been done using one-dimensional confidence intervals by freezing one variable - false-positiverate, or threshold on the classification scoring function. Weadapt two prior methods and introduce a new radial sweepmethod to generate confidence bands. We show, throughempirical studies, that the bands are too tight and introducea general optimization methodology for creatingbands that better fit the data, as well as methods for evaluatingconfidence bands. We show empirically that theoptimized confidence bands fit much better and that, usingour new evaluation method, it is possible to gauge therelative fit of different confidence bands.

[1]  Rupert G. Miller Simultaneous Statistical Inference , 1966 .

[2]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[3]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[4]  J R Beck,et al.  The use of relative operating characteristic (ROC) curves in test performance evaluation. , 1986, Archives of pathology & laboratory medicine.

[5]  Tim Niblett,et al.  Constructing Decision Trees in Noisy Domains , 1987, EWSL.

[6]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[7]  R. Hilgers Distribution-Free Confidence Bounds for ROC Curves , 1991, Methods of Information in Medicine.

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[10]  W. Hall,et al.  Confidence Bands for Receiver Operating Characteristic Curves , 1993, Medical decision making : an international journal of the Society for Medical Decision Making.

[11]  G. Campbell,et al.  Advances in statistical methodology for the evaluation of diagnostic and laboratory tests. , 1994, Statistics in medicine.

[12]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[13]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[14]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[15]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[16]  K. Zou,et al.  Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. , 1997, Statistics in medicine.

[17]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[18]  C A Roe,et al.  Statistical Comparison of Two ROC-curve Estimates Obtained from Partially-paired Datasets , 1998, Medical decision making : an international journal of the Society for Medical Decision Making.

[19]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[20]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[21]  Paul B. Kantor,et al.  Predicting the effectiveness of naïve data fusion on the basis of system characteristics , 2000, J. Am. Soc. Inf. Sci..

[22]  J. Garibaldi,et al.  Receiver Operating Characteristic analysis for Intelligent Medical Systems – a new approach for finding non – parametric confidence intervals , 2000 .

[23]  Thomas G. Dietterich,et al.  Bootstrap Methods for the Cost-Sensitive Evaluation of Classifiers , 2000, ICML.

[24]  Vasant Dhar,et al.  Intelligent information triage , 2001, SIGIR '01.

[25]  Haym Hirsh,et al.  New techniques in intelligent information filtering , 2003 .

[26]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[27]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[28]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.