ROC analysis in medical imaging: a tutorial review of the literature

Receiver operating characteristic (ROC) analysis measures the “diagnostic accuracy” of a medical imaging system, which represents the second level of diagnostic efficacy in the hierarchical model described by Fryback and Thornbury (Med Decis Making 11:88–94, 1991). After describing the historical origins of ROC analysis, this paper reviews the importance of sampling cases appropriately, designing an observer study to avoid bias, and collecting data on a useful scale. A variety of methods for fitting ROC curves to observer data and testing the statistical significance of apparent differences are then reported. Finally, generalized forms of ROC analysis that require lesion localization or allow more than two states of truth are surveyed briefly.

[1]  S J Wyard Medical Images: Formation, Perception and Measurement. , 1977 .

[2]  David Gur,et al.  The prevalence effect in a laboratory environment: Changing the confidence ratings. , 2007, Academic radiology.

[3]  K S Berbaum,et al.  Multireader, multicase receiver operating characteristic methodology: a bootstrap analysis. , 1995, Academic radiology.

[4]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[5]  D. C. Barber,et al.  Medical Imaging-The Assessment of Image Quality , 1996 .

[6]  D. Turner,et al.  An intuitive approach to receiver operating characteristic curve analysis. , 1978, Journal of nuclear medicine : official publication, Society of Nuclear Medicine.

[7]  J A Hanley,et al.  A Comparison of Parametric and Nonparametric Approaches to ROC Analysis of Quantitative Diagnostic Tests , 1997, Medical decision making : an international journal of the Society for Medical Decision Making.

[8]  J. Swets,et al.  A decision-making theory of visual detection. , 1954, Psychological review.

[9]  Charles E. Metz,et al.  Quantitative evaluation of visual detection performance in medicine: ROC analysis and determination of diagnostic benefit. [Radiographic image evaluation] , 1976 .

[10]  L B Lusted,et al.  Radiographic applications of signal detection theory. , 1972, Radiology.

[11]  E. Keeler,et al.  Primer on certain elements of medical decision making. , 1975, The New England journal of medicine.

[12]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[13]  John A. Swets,et al.  The human use of information-III: Decision-making in signal detection and recognition situations involving multiple alternatives , 1956, IRE Trans. Inf. Theory.

[14]  C. Metz,et al.  Visual detection and localization of radiographic images. , 1975, Radiology.

[15]  C B Begg,et al.  A General Regression Methodology for ROC Curve Estimation , 1988, Medical decision making : an international journal of the Society for Medical Decision Making.

[16]  B. McNeil,et al.  Assessment of radiologic tests: control of bias and other design considerations. , 1988, Radiology.

[17]  D. Chakraborty,et al.  Free-response methodology: alternate analysis and a new observer-performance experiment. , 1990, Radiology.

[18]  Charles E. Metz,et al.  Restrictions on the three-class ideal observer's decision boundary lines , 2005, IEEE Transactions on Medical Imaging.

[19]  R. Swensson Unified measurement of observer performance in detecting and localizing target objects on images. , 1996, Medical physics.

[20]  Xin He,et al.  Three-class ROC analysis-a decision theoretic approach under the ideal observer framework , 2006, IEEE Transactions on Medical Imaging.

[21]  Andrew D. A. Maidment,et al.  Comparison of receiver operating characteristic curves on the basis of optimal operating points. , 1996, Academic radiology.

[22]  N. Obuchowski,et al.  Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: An anova approach with dependent observations , 1995 .

[23]  J SWETS,et al.  Decision processes in perception. , 1961, Psychological review.

[24]  Berkman Sahiner,et al.  Design of three-class classifiers in computer-aided diagnosis: Monte Carlo simulation study , 2003, SPIE Medical Imaging.

[25]  David M. Green,et al.  Applications of Signal Detection Theory , 1978 .

[26]  C E Metz,et al.  Variance-component modeling in the analysis of receiver operating characteristic index estimates. , 1997, Academic radiology.

[27]  C. Metz,et al.  "Proper" Binormal ROC Curves: Theory and Maximum-Likelihood Estimation. , 1999, Journal of mathematical psychology.

[28]  N A Obuchowski,et al.  Sample size tables for receiver operating characteristic studies. , 2000, AJR. American journal of roentgenology.

[29]  L B Lusted,et al.  Factors affecting the detectability of a simulated radiographic signal. , 1973, Investigative radiology.

[30]  J. Hanley The Robustness of the "Binormal" Assumptions Used in Fitting ROC Curves , 1988, Medical decision making : an international journal of the Society for Medical Decision Making.

[31]  H L Kundel,et al.  The evaluation of radiographic techniques by observer tests: problems, pitfalls, and procedures. , 1974, Investigative radiology.

[32]  Dev Chakraborty,et al.  Statistical power in observer-performance studies: comparison of the receiver operating characteristic and free-response methods in tasks involving localization. , 2002, Academic radiology.

[33]  S. Walsh,et al.  Limitations to the robustness of binormal ROC curves: effects of model misspecification and location of decision thresholds on bias, precision, size and power. , 1997, Statistics in medicine.

[34]  A. Feinstein,et al.  Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. , 1978, The New England journal of medicine.

[35]  L. Lusted Logical analysis in roentgen diagnosis. , 1960, Radiology.

[36]  J A Swets,et al.  Form of empirical ROCs in discrimination and diagnostic tasks: implications for theory and measurement of performance. , 1986, Psychological bulletin.

[37]  L B Lusted,et al.  Signal detectability and medical decision-making. , 1971, Science.

[38]  Robert M. Nishikawa,et al.  The hypervolume under the ROC hypersurface of "Near-Guessing" and "Near-Perfect" observers in N-class classification tasks , 2005, IEEE Transactions on Medical Imaging.

[39]  David Gur Objectively measuring and comparing performance levels of diagnostic imaging systems and practices. , 2007, Academic radiology.

[40]  David Gur,et al.  Prevalence effect in a laboratory environment. , 2003, Radiology.

[41]  R. F. Wagner,et al.  Toward consensus on quantitative assessment of medical imaging systems. , 1995, Medical physics.

[42]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[43]  Xin He,et al.  An Optimal Three-Class Linear Observer Derived From Decision Theory , 2007, IEEE Transactions on Medical Imaging.

[44]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[45]  R. F. Wagner,et al.  Continuous versus categorical data for ROC analysis: some quantitative considerations. , 2001, Academic radiology.

[46]  C. Metz ROC Methodology in Radiologic Imaging , 1986, Investigative radiology.

[47]  James P. Egan,et al.  Operating Characteristics, Signal Detectability, and the Method of Free Response , 1961 .

[48]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[49]  J. Swets ROC analysis applied to the evaluation of medical imaging techniques. , 1979, Investigative radiology.

[50]  Harold L. Kundel,et al.  Physics and psychophysics , 2000 .

[51]  John A. Swets,et al.  Effectiveness of information retrieval methods , 1969 .

[52]  John A. Swets,et al.  Signal Detection Theory and ROC Analysis in Psychology and Diagnostics: Collected Papers , 1996 .

[53]  C. Metz,et al.  Comments on the generalization of receiver operating characteristic analysis to detection and localization tasks. , 1977, Physics in medicine and biology.

[54]  R. F. Wagner,et al.  Components-of-variance models for random-effects ROC analysis: the case of unequal variance structures across modalities. , 2001, Academic radiology.

[55]  E. Krupinski,et al.  Anniversary paper: evaluation of medical imaging systems. , 2008, Medical physics.

[56]  Claudia Mello-Thoms,et al.  Spatial localization accuracy of radiologists in free-response studies: Inferring perceptual FROC curves from mark-rating data. , 2007, Academic radiology.

[57]  J. Swets Signal detection and recognition by human observers : contemporary readings , 1964 .

[58]  A. Toledano,et al.  Ordinal regression methodology for ROC curves derived from correlated data. , 1996, Statistics in medicine.

[59]  W. S. Andrus,et al.  Editorial: Radiology and the receiver operating characteristic (ROC) curve. , 1975, Chest.

[60]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[61]  Charles E. Metz,et al.  Review of several proposed three-class classification decision rules and their relation to the ideal observer decision rule , 2005, SPIE Medical Imaging.

[62]  Dev P Chakraborty,et al.  Analysis of location specific observer performance data: validated extensions of the jackknife free-response (JAFROC) method. , 2006, Academic radiology.

[63]  R. F. Wagner,et al.  Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods. , 2004, Academic radiology.

[64]  J. Swets The Relative Operating Characteristic in Psychology , 1973, Science.

[65]  C. Metz,et al.  Statistical significance tests for binormal ROC curves , 1980 .

[66]  Berkman Sahiner,et al.  Performance Analysis of Three-Class Classifiers: Properties of a 3-D ROC Surface and the Normalized Volume Under the Surface for the Ideal Observer , 2008, IEEE Transactions on Medical Imaging.

[67]  Nico Karssemeijer,et al.  Computer-Aided Diagnosis in Medical Imaging , 2001, IEEE Trans. Medical Imaging.

[68]  H E Rockette,et al.  The use of continuous and discrete confidence judgments in receiver operating characteristic studies of diagnostic imaging techniques. , 1992, Investigative radiology.

[69]  C E Metz,et al.  The "proper" binormal model: parametric receiver operating characteristic curve estimation with degenerate data. , 1997, Academic radiology.

[70]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[71]  Charles E. Metz Fundamental ROC Analysis , 2000 .

[72]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[73]  Edward James Potchen Current concepts in radiology , 1972 .

[74]  D P Chakraborty,et al.  Data analysis for detection and localization of multiple abnormalities with application to mammography. , 2000, Academic radiology.

[75]  J R Beck,et al.  Decision-making Studies in Patient Management , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[76]  Darrin C. Edwards,et al.  Estimating three-class ideal observer decision variables for computerized detection and classification of mammographic mass lesions. , 2003, Medical physics.

[77]  C A Roe,et al.  Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation. , 1997, Academic radiology.

[78]  R F Wagner,et al.  Analysis of uncertainties in estimates of components of variance in multivariate ROC analysis. , 2001, Academic radiology.

[79]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[80]  K. Berbaum,et al.  Proper receiver operating characteristic analysis: the bigamma model. , 1997, Academic radiology.

[81]  A Burgess,et al.  Image quality, the ideal observer, and human performance of radiologic decision tasks. , 1995, Academic radiology.

[82]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .

[83]  D P Chakraborty,et al.  Maximum likelihood analysis of free-response receiver operating characteristic (FROC) data. , 1989, Medical physics.

[84]  C E Metz,et al.  Quantification of failure to demonstrate statistical significance. The usefulness of confidence intervals. , 1993, Investigative radiology.

[85]  Byron J. T. Morgan,et al.  Some aspects of ROC curve-fitting: Normal and logistic models , 1972 .

[86]  H E Rockette,et al.  Effect of observer instruction on ROC study of chest images. , 1990, Investigative radiology.

[87]  Dev P Chakraborty,et al.  Observer studies involving detection and localization: modeling, analysis, and validation. , 2004, Medical physics.

[88]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[89]  P C Lambert,et al.  A Bayesian Approach to a General Regression Model for ROC Curves , 1998, Medical decision making : an international journal of the Society for Medical Decision Making.

[90]  David Lindley,et al.  Statistical Decision Functions , 1951, Nature.

[91]  Berkman Sahiner,et al.  Quasi-continuous and discrete confidence rating scales for observer performance studies: Effects on ROC analysis. , 2007, Academic radiology.

[92]  K. Doi,et al.  Effect of a computer-aided diagnosis scheme on radiologists' performance in detection of lung nodules on radiographs. , 1996, Radiology.

[93]  C A Roe,et al.  Statistical Comparison of Two ROC-curve Estimates Obtained from Partially-paired Datasets , 1998, Medical decision making : an international journal of the Society for Medical Decision Making.

[94]  E A Robertson,et al.  Evaluating the clinical efficacy of laboratory tests. , 1983, American journal of clinical pathology.

[95]  C. Metz,et al.  A receiver operating characteristic partial area index for highly sensitive diagnostic tests. , 1996, Radiology.

[96]  H. Kundel,et al.  The Effect of Verification on the Assessment of Imaging Techniques , 1983, Investigative radiology.

[97]  Herbert L. Pick,et al.  Psychology: From Research to Practice , 1978 .

[98]  B. Turnbull,et al.  NONPARAMETRIC AND SEMIPARAMETRIC ESTIMATION OF THE RECEIVER OPERATING CHARACTERISTIC CURVE , 1996 .

[99]  R. F. Wagner,et al.  Assessment of medical imaging systems and computer aids: a tutorial review. , 2007, Academic radiology.

[100]  L B Lusted,et al.  General problems in medical decision making with comments on ROC analysis. , 1978, Seminars in nuclear medicine.

[101]  P. Greenland,et al.  Selection and interpretation of diagnostic tests and procedures. Principles and applications. , 1981, Annals of internal medicine.

[102]  N A Obuchowski,et al.  Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. , 1995, Academic radiology.

[103]  Stephen L Hillis,et al.  Monte Carlo validation of the Dorfman-Berbaum-Metz method using normalized pseudovalues and less data-based model simplification. , 2005, Academic radiology.

[104]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[105]  D. A. Bell,et al.  Information processing in scintigraphy: US Energy Research and Development Agency (undated). 209 pp. US $7.60 , 1977 .

[106]  R. F. Wagner,et al.  Assessment of medical imaging and computer-assist systems: lessons from recent experience. , 2002, Academic radiology.

[107]  Lee B. Lusted,et al.  Introduction to medical decision making , 1968 .

[108]  C. Gatsonis,et al.  Generalized Estimating Equations for Ordinal Categorical Data: Arbitrary Patterns of Missing Responses and Missingness in a Key Covariate , 1999, Biometrics.

[109]  R A Greenes,et al.  Construction of Receiver Operating Characteristic Curves when Disease Verification Is Subject to Selection Bias , 1984, Medical decision making : an international journal of the Society for Medical Decision Making.

[110]  K S Berbaum,et al.  Degeneracy and discrete receiver operating characteristic rating data. , 1995, Academic radiology.

[111]  Matthew A. Kupinski,et al.  Ideal observers and optimal ROC hypersurfaces in N-class classification , 2004, IEEE Transactions on Medical Imaging.

[112]  R. H. Morgan,et al.  Decision processes and observer error in the diagnosis of pneumoconiosis by chest roentgenography. , 1973, The American journal of roentgenology, radium therapy, and nuclear medicine.

[113]  Charles E Metz,et al.  Receiver operating characteristic analysis: a tool for the quantitative evaluation of observer performance and imaging systems. , 2006, Journal of the American College of Radiology : JACR.

[114]  K S Berbaum,et al.  Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design. , 1998, Academic radiology.

[115]  X H Zhou,et al.  A simple method for comparing correlated ROC curves using incomplete data. , 1996, Statistics in medicine.

[116]  Dennis G. Fryback,et al.  The Efficacy of Diagnostic Imaging , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[117]  C. Metz,et al.  A New Approach for Testing the Significance of Differences Between ROC Curves Measured from Correlated Data , 1984 .

[118]  Berkman Sahiner,et al.  Performance analysis of 3-class classifiers: properties of the 3D ROC surface and the normalized volume under the surface , 2006, SPIE Medical Imaging.

[119]  Alan Halpern Computer Diagnosis and Diagnostic Methods , 1974, The Yale Journal of Biology and Medicine.

[120]  G. Namkoong,et al.  Statistical analysis of introgression. , 1966, Biometrics.

[121]  John F. Hamilton,et al.  A Free Response Approach To The Measurement And Characterization Of Radiographic Observer Performance , 1977, Other Conferences.

[122]  L B Lusted,et al.  Radiographic applications of receiver operating characteristic (ROC) curves. , 1974, Radiology.

[123]  H E Rockette,et al.  On the validity of the continuous and discrete confidence rating scales in receiver operating characteristic studies. , 1993, Investigative radiology.

[124]  Charles E. Metz,et al.  Progress in evaluation of human observer visual detection performance using the ROC curve approach , 1976 .

[125]  R. F. Wagner,et al.  Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis. , 2000, Academic radiology.

[126]  Mitchell H. Gail,et al.  A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data , 1989 .

[127]  J. A. Swets,et al.  Assessment of NDT systems. I. The relationship of true and falx detections. II. Indices of performance , 1983 .

[128]  C. Metz,et al.  Analysis of proposed three-class classification decision rules in terms of the ideal observer decision rule , 2006 .

[129]  Darrin C. Edwards,et al.  Maximum likelihood fitting of FROC curves under an initial-detection-and-candidate-analysis model. , 2002, Medical physics.

[130]  Nancy A Obuchowski,et al.  A comparison of the Dorfman–Berbaum–Metz and Obuchowski–Rockette methods for receiver operating characteristic (ROC) data , 2005, Statistics in medicine.

[131]  J. Hanley Receiver operating characteristic (ROC) methodology: the state of the art. , 1989, Critical reviews in diagnostic imaging.

[132]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[133]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[134]  R A Greenes,et al.  Assessment of diagnostic tests when disease verification is subject to selection bias. , 1983, Biometrics.

[135]  David Middleton,et al.  Modern statistical approaches to reception in communication theory , 1954, Trans. IRE Prof. Group Inf. Theory.

[136]  N A Obuchowski,et al.  Multireader receiver operating characteristic studies: a comparison of study designs. , 1995, Academic radiology.

[137]  C E Metz,et al.  Some practical issues of experimental design and data analysis in radiological ROC studies. , 1989, Investigative radiology.

[138]  J A Hanley,et al.  The use of the 'binormal' model for parametric ROC analysis of quantitative diagnostic tests. , 1996, Statistics in medicine.

[139]  Stephen L Hillis,et al.  Power estimation for the Dorfman-Berbaum-Metz method. , 2004, Academic radiology.

[140]  D P Chakraborty A search model and figure of merit for observer data acquired according to the free-response paradigm. , 2006, Physics in medicine and biology.

[141]  P.N.T. Wells Medical images: formation, perception and measurement: Leeds, England, 13–15 April 1976 , 1976 .

[142]  W Zucchini,et al.  On the statistical analysis of ROC curves. , 1989, Statistics in medicine.

[143]  Lorenzo L. Pesce,et al.  Reliable and computationally efficient maximum-likelihood estimation of "proper" binormal ROC curves. , 2007, Academic radiology.

[144]  W. W. Peterson,et al.  The theory of signal detectability , 1954, Trans. IRE Prof. Group Inf. Theory.