Evaluation of diagnostic performance using partial area under the roc curve

Evaluation of diagnostic performance is critical in many fields including but not limited to diagnostic medicine. The Receiver Operating Characteristic (ROC) curve is the most widely used methodology for describing the intrinsic performance of diagnostic tests, with the area under the curve (AUC) being the most commonly used summary index of overall performance. The partial area under the ROC curve (pAUC), when focused on the range of practical/clinical relevance, is considered a more relevant summary index than the full AUC. However, several conceptual and analytical difficulties frequently prevent the pAUC from being used. First, in many diagnostic setting the relevant range is difficult to determine objectively. Second, in theory, due to potential use of less information, analysis based on the pAUC could lead to the loss of statistical precision and therefore would require larger sample sizes. Through mathematical derivation, extensive simulation studies and practical examples, this work investigates statistical properties when using the pAUC. First, this work demonstrates that in many practical scenarios inferences based on pAUC could be more powerful than inferences based on the full AUC. Thus, the use of the pAUC may lead to not only more clinically relevant but also more conclusive results in analyses of experimental data. Second, this investigation demonstrates that the advantages of pAUC-based inferences depend on the shape of ROC curves. The conventional binormal model does not always adequately describe scenarios where the pAUC is more statistically efficient. The bi-gamma family of concave ROC curves is shown to describe practically reasonable scenarios where either pAUC or full AUC could be advantageous. Programs for sample size estimation based on bi-gamma model are then developed. Finally, this work investigates the properties of pAUC-based inferences in scenarios where diagnostic results have substantial ties (or a "mass") at the lowest diagnostic results. For certain type of the ROC curves the existence of ties could lead to an increase in statistical efficiency. Forcing a diagnostic system to resolve ties could detrimentally affect reliability and conclusiveness of statistical inferences. In conclusion, this work provides investigators with insights into and tools for generating practically relevant conclusions using pAUC. The public health importance of this work stems from the relevance of the ROC analysis at different stages of development and regulatory approval of diagnostic systems in medicine. Enhanced methodology for evaluation of diagnostic accuracy helps in the development of improved diagnostic systems and could accelerate the delivery and clinical adoption of truly beneficial diagnostic technologies and/or clinical practices.

[1]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[2]  S. Rachev Handbook of heavy tailed distributions in finance , 2003 .

[3]  Enrique F Schisterman,et al.  Receiver operating characteristic curve inference from a sample with a limit of detection. , 2006, American journal of epidemiology.

[4]  Siu-Keung Tse,et al.  Estimation of p(y , 1986 .

[5]  David Gur,et al.  Use of likelihood ratios for comparisons of binary diagnostic tests: underlying ROC curves. , 2010, Medical physics.

[6]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[7]  David Gur,et al.  Digital breast tomosynthesis versus supplemental diagnostic mammographic views for evaluation of noncalcified breast lesions. , 2013, Radiology.

[8]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[9]  C E Metz,et al.  Some practical issues of experimental design and data analysis in radiological ROC studies. , 1989, Investigative radiology.

[10]  Ehtesham Hussain The Bi-Gamma ROC Curve in a Straightforward Manner , 2012 .

[11]  J A Hanley,et al.  A Comparison of Parametric and Nonparametric Approaches to ROC Analysis of Quantitative Diagnostic Tests , 1997, Medical decision making : an international journal of the Society for Medical Decision Making.

[12]  E. Keeler,et al.  Primer on certain elements of medical decision making. , 1975, The New England journal of medicine.

[13]  Enrique F Schisterman,et al.  Roc Curve Analysis for Biomarkers Based on Pooled Assessments , 2022 .

[14]  N A Obuchowski,et al.  Sample size determination for diagnostic accuracy studies involving binormal ROC curve indices. , 1997, Statistics in medicine.

[15]  James A Hanley,et al.  Comparison of three methods for estimating the standard error of the area under the curve in ROC analysis of quantitative data. , 2002, Academic radiology.

[16]  Enrique F Schisterman,et al.  Youden Index and the optimal threshold for markers with mass at zero , 2008, Statistics in medicine.

[17]  Harold Hotelling,et al.  Rank Correlation and Tests of Significance Involving No Assumption of Normality , 1936 .

[18]  D. Shapiro,et al.  The interpretation of diagnostic tests , 1999, Statistical methods in medical research.

[19]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[20]  Jean L Freeman,et al.  A non-parametric method for the comparison of partial areas under ROC curves and its application to large health care data sets. , 2002, Statistics in medicine.

[21]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[22]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .

[23]  B J Biggerstaff,et al.  Comparing diagnostic tests: a simple graphic using likelihood ratios. , 2000, Statistics in medicine.

[24]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[25]  Stephen L Hillis,et al.  An analytic expression for the binormal partial area under the ROC curve. , 2012, Academic radiology.

[26]  Xiao-Hua Zhou,et al.  Statistical Methods in Diagnostic Medicine , 2002 .

[27]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[28]  S. Greenhouse,et al.  The evaluation of diagnostic tests. , 1950, Biometrics.

[29]  Tianxi Cai,et al.  Regression Analysis for the Partial Area Under the ROC Curve , 2006 .

[30]  A. Flahault,et al.  Sample size calculation should be performed for design accuracy in diagnostic test studies. , 2005, Journal of clinical epidemiology.

[31]  D. Norman,et al.  A COMPARISON OF DATA OBTAINED WITH DIFFERENT FALSE-ALARM RATES. , 1964, Psychological review.

[32]  R. Nelsen An Introduction to Copulas , 1998 .

[33]  Enrique F Schisterman,et al.  ROC analysis for markers with mass at zero , 2006, Statistics in medicine.

[34]  David Gur,et al.  On use of partial area under the ROC curve for evaluation of diagnostic performance , 2013, Statistics in medicine.

[35]  C. Metz,et al.  A receiver operating characteristic partial area index for highly sensitive diagnostic tests. , 1996, Radiology.

[36]  Gengsheng Qin,et al.  Empirical Likelihood Inference for the Area under the ROC Curve , 2006, Biometrics.

[37]  N. Perkins,et al.  Generalized ROC curve inference for a biomarker subject to a limit of detection and measurement error , 2009, Statistics in medicine.

[38]  Chengqing Wu,et al.  Nonparametric Estimation and Hypothesis Testing on the Partial Area Under Receiver Operating Characteristic Curves , 2005 .

[39]  R. F. Wagner,et al.  Continuous versus categorical data for ROC analysis: some quantitative considerations. , 2001, Academic radiology.

[40]  K. Zou,et al.  Two transformation models for estimating an ROC curve derived from continuous data , 2000 .

[41]  Xin Huang,et al.  CONFIDENCE INTERVALS FOR THE DIFFERENCE BETWEEN TWO PARTIAL AUCS , 2012 .

[42]  Mitchell H. Gail,et al.  A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data , 1989 .

[43]  J. Hanley Receiver operating characteristic (ROC) methodology: the state of the art. , 1989, Critical reviews in diagnostic imaging.

[44]  Song Xi Chen,et al.  Smoothed Block Empirical Likelihood for Quantiles of Weakly Dependent Processes , 2006 .

[45]  Lori E. Dodd,et al.  Partial AUC Estimation and Regression , 2003, Biometrics.