The ROC Curve and the Area under It as Performance Measures

Abstract The receiver operating characteristic (ROC) curve is a two-dimensional measure of classification performance. The area under the ROC curve (AUC) is a scalar measure gauging one facet of performance. In this short article, five idealized models are utilized to relate the shape of the ROC curve, and the area under it, to features of the underlying distribution of forecasts. This allows for an interpretation of the former in terms of the latter. The analysis is pedagogical in that many of the findings are already known in more general (and more realistic) settings; however, the simplicity of the models considered here allows for a clear exposition of the relation. For example, although in general there are many reasons for an asymmetric ROC curve, the models considered here clearly illustrate that an asymmetry in the ROC curve can be attributed to unequal widths of the distributions. Furthermore, it is shown that AUC discriminates well between “good” and “bad” models, but not between good models.

[1]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[2]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[3]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[4]  A. H. Murphy,et al.  A General Framework for Forecast Verification , 1987 .

[5]  P. Robinson The interpretation of diagnostic tests. , 1987, Nuclear medicine communications.

[6]  Zengo Furukawa,et al.  A General Framework for , 1991 .

[7]  A. H. Murphy,et al.  Diagnostic verification of probability forecasts , 1992 .

[8]  Lewis O. Harvey,et al.  The Application of Signal Detection Theory to Weather Forecasting Behavior , 1992 .

[9]  Timothy Masters,et al.  Practical neural network recipes in C , 1993 .

[10]  K. Berbaum,et al.  Proper receiver operating characteristic analysis: the bigamma model. , 1997, Academic radiology.

[11]  M. Coffin,et al.  Receiver operating characteristic studies and measurement errors. , 1997, Biometrics.

[12]  C. Marzban Scalar measures of performance in rare-event situations , 1998 .

[13]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[14]  Nicholas E. Graham,et al.  Conditional Probabilities, Relative Operating Characteristics, and Relative Operating Levels , 1999 .

[15]  D. Stephenson Use of the “Odds Ratio” for Diagnosing Forecast Skill , 2000 .

[16]  Caren Marzban,et al.  A Bayesian Neural Network for Severe-Hail Size Prediction , 2001 .

[17]  David S. Richardson,et al.  Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of ensemble size , 2001 .

[18]  D. Wilks A skill score based on economic value for probability forecasts , 2001 .

[19]  Xiao-Hua Zhou,et al.  Statistical Methods in Diagnostic Medicine , 2002 .

[20]  Peter Guttorp,et al.  A Markov Chain Model of Tornadic Activity , 2003 .

[21]  F. Atger,et al.  Estimation of the reliability of ensemble‐based probabilistic forecasts , 2004 .