Assessment of medical imaging systems and computer aids: a tutorial review.

This article reviews the central issues that arise in the assessment of diagnostic imaging and computer-assist modalities. The paradigm of the receiver operating characteristic (ROC) curve--the dependence of the true-positive fraction versus the false-positive fraction as a function of the level of aggressiveness of the reader/radiologist toward a positive call--is essential to this field because diagnostic imaging systems are used in multiple settings, including controlled laboratory studies in which the prevalence of disease is different from that encountered in a study in the field. The basic equation of statistical decision theory is used to display how readers can vary their level of aggressiveness according to this diagnostic context. Most studies of diagnostic modalities in the last 15 years have demonstrated not only a range of levels of reader aggressiveness, but also a range of level of reader performance. These characteristics require a multivariate approach to ROC analysis that accounts for both the variation of case difficulty and the variation of reader skill in a study. The resulting paradigm is called the multiple-reader, multiple-case ROC paradigm. Highlights of historic as well as contemporary work in this field are reviewed. Many practical issues related to study design and resulting statistical power are included, together with recent developments and availability of analytical software.

[1]  N A Obuchowski,et al.  Multireader receiver operating characteristic studies: a comparison of study designs. , 1995, Academic radiology.

[2]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[3]  C E Metz,et al.  Some practical issues of experimental design and data analysis in radiological ROC studies. , 1989, Investigative radiology.

[4]  D D Patton,et al.  Introduction to clinical decision making. , 1978, Seminars in nuclear medicine.

[5]  Murray H. Loew,et al.  Comparison of non-parametric methods for assessing classifier performance in terms of ROC parameters , 2004, 33rd Applied Imagery Pattern Recognition Workshop (AIPR'04).

[6]  J. Swets,et al.  Assessment of diagnostic technologies. , 1979, Science.

[7]  Harold L. Kundel,et al.  Physics and psychophysics , 2000 .

[8]  R. Swensson Unified measurement of observer performance in detecting and localizing target objects on images. , 1996, Medical physics.

[9]  H. Kundel,et al.  The Effect of Verification on the Assessment of Imaging Techniques , 1983, Investigative radiology.

[10]  David Gur,et al.  From the laboratory to the clinic: the "prevalence effect". , 2003, Academic radiology.

[11]  R. F. Wagner,et al.  Reader Variability in Mammography and Its Implications for Expected Utility over the Population of Readers and Cases , 2004, Medical decision making : an international journal of the Society for Medical Decision Making.

[12]  James J. Bailey,et al.  Nonparametric comparison of two tests of cardiac function on the same patient population using the entire ROC curve , 1988, Proceedings. Computers in Cardiology 1988.

[13]  X H Zhou,et al.  Comparing correlated areas under the ROC curves of two diagnostic tests in the presence of verification bias. , 1998, Biometrics.

[14]  Heang-Ping Chan,et al.  On the repeated use of databases for testing incremental improvement of computer-aided detection schemes. , 2004, Academic radiology.

[15]  A. Morrison,et al.  Basic issues in population screening for cancer. , 1980, Journal of the National Cancer Institute.

[16]  C. Beam,et al.  Variability in the interpretation of screening mammograms by US radiologists. Findings from a national sample. , 1996, Archives of internal medicine.

[17]  Dev P Chakraborty,et al.  Observer studies involving detection and localization: modeling, analysis, and validation. , 2004, Medical physics.

[18]  Brandon D Gallas,et al.  One-shot estimate of MRMC variance: AUC. , 2006, Academic radiology.

[19]  Frank W. Samuelson,et al.  Bootstrapped MRMC confidence intervals , 2005, SPIE Medical Imaging.

[20]  S. Hillis A comparison of denominator degrees of freedom methods for multiple observer ROC analysis , 2007, Statistics in medicine.

[21]  Matthew A. Kupinski,et al.  Probabilistic foundations of the MRMC method , 2005, SPIE Medical Imaging.

[22]  Dennis G. Fryback,et al.  The Efficacy of Diagnostic Imaging , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[23]  R. F. Wagner,et al.  Study design in the evaluation of breast cancer imaging technologies. , 2000, Academic radiology.

[24]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .

[25]  D. Chakraborty,et al.  Free-response methodology: alternate analysis and a new observer-performance experiment. , 1990, Radiology.

[26]  D P Chakraborty,et al.  Maximum likelihood analysis of free-response receiver operating characteristic (FROC) data. , 1989, Medical physics.

[27]  John F. Hamilton,et al.  A Free Response Approach To The Measurement And Characterization Of Radiographic Observer Performance , 1977, Other Conferences.

[28]  Kunio Doi,et al.  Independent versus sequential reading in ROC studies of computer-assist modalities: analysis of components of variance. , 2002, Academic radiology.

[29]  Stuart G. Baker,et al.  A Proposed Design and Analysis for Comparing Digital and Analog Mammography , 2001 .

[30]  Darrin C. Edwards,et al.  Maximum likelihood fitting of FROC curves under an initial-detection-and-candidate-analysis model. , 2002, Medical physics.

[31]  Nancy A Obuchowski,et al.  A comparison of the Dorfman–Berbaum–Metz and Obuchowski–Rockette methods for receiver operating characteristic (ROC) data , 2005, Statistics in medicine.

[32]  J. Hanley Receiver operating characteristic (ROC) methodology: the state of the art. , 1989, Critical reviews in diagnostic imaging.

[33]  H. H. Song,et al.  Analysis of correlated ROC areas in diagnostic testing. , 1997, Biometrics.

[34]  Robert F Wagner Toward a strategy for consensus development on a quantitative approach to medical imaging. , 2006, Academic radiology.

[35]  C E Metz,et al.  Gains in Accuracy from Replicated Readings of Diagnostic Images , 1992, Medical decision making : an international journal of the Society for Medical Decision Making.

[36]  B. McNeil,et al.  Assessment of radiologic tests: control of bias and other design considerations. , 1988, Radiology.

[37]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[38]  C A Gatsonis,et al.  Regression analysis of correlated receiver operating characteristic data. , 1995, Academic radiology.

[39]  Michael A. King,et al.  Case sampling in LROC: a Monte Carlo analysis , 2001, SPIE Medical Imaging.

[40]  David Gur,et al.  A comparison of two data analyses from two observer performance studies using Jackknife ROC and JAFROC. , 2005, Medical physics.

[41]  Marcus A. Maloof,et al.  A General Model for Finite-Sample Effects in Training and Testing of Competing Classifiers , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  C. Metz,et al.  "Proper" Binormal ROC Curves: Theory and Maximum-Likelihood Estimation. , 1999, Journal of mathematical psychology.

[43]  Xiao Song,et al.  A marginal model approach for analysis of multi-reader multi-test receiver operating characteristic (ROC) data. , 2005, Biostatistics.

[44]  Andriy I. Bandos,et al.  Exact Bootstrap Variances of the Area Under ROC Curve , 2007 .

[45]  C. Metz,et al.  A receiver operating characteristic partial area index for highly sensitive diagnostic tests. , 1996, Radiology.

[46]  Harold L. Kundel,et al.  Modeling visual search during mammogram viewing , 2004, SPIE Medical Imaging.

[47]  David Gur,et al.  Prevalence effect in a laboratory environment. , 2003, Radiology.

[48]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[49]  Eric Clarkson,et al.  A probabilistic model for the MRMC method, part 2: validation and applications. , 2006, Academic radiology.

[50]  B J Biggerstaff,et al.  Comparing diagnostic tests: a simple graphic using likelihood ratios. , 2000, Statistics in medicine.

[51]  R. F. Wagner,et al.  Components-of-variance models for random-effects ROC analysis: the case of unequal variance structures across modalities. , 2001, Academic radiology.

[52]  D. Chakraborty ROC curves predicted by a model of visual search , 2006, Physics in medicine and biology.

[53]  K S Berbaum,et al.  A contaminated binormal model for ROC data: Part II. A formal model. , 2000, Academic radiology.

[54]  N. Obuchowski,et al.  Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: An anova approach with dependent observations , 1995 .

[55]  S. Greenhouse,et al.  The evaluation of diagnostic tests. , 1950, Biometrics.

[56]  C E Metz,et al.  Variance-component modeling in the analysis of receiver operating characteristic index estimates. , 1997, Academic radiology.

[57]  H E Rockette,et al.  Empiric assessment of parameters that affect the design of multireader receiver operating characteristic studies. , 1999, Academic radiology.

[58]  Dev Chakraborty,et al.  Statistical power in observer-performance studies: comparison of the receiver operating characteristic and free-response methods in tasks involving localization. , 2002, Academic radiology.

[59]  Robert M. Nishikawa,et al.  Can computer-aided diagnosis (CAD) help radiologists find mammographically missed screening cancers? , 2001, SPIE Medical Imaging.

[60]  Frank W. Samuelson,et al.  Comparing image detection algorithms using resampling , 2006, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006..

[61]  Murray H. Loew,et al.  Estimating the uncertainty in the estimated mean area under the ROC curve of a classifier , 2005, Pattern Recognit. Lett..

[62]  C. Rutter,et al.  Bootstrap estimation of diagnostic accuracy with patient-clustered data. , 2000, Academic radiology.

[63]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[64]  Craig A. Beam,et al.  Variability in the interpretation of screening mammograms by US radiologists. Findings from a national sample. , 1996, Archives of internal medicine.

[65]  Anna Bornefalk Hermansson,et al.  On the comparison of FROC curves in mammography CAD systems. , 2005, Medical physics.

[66]  Charles E. Metz Fundamental ROC Analysis , 2000 .

[67]  M. Giger,et al.  Improving breast cancer diagnosis with computer-aided diagnosis. , 1999, Academic radiology.

[68]  A. Toledano,et al.  Ordinal regression methodology for ROC curves derived from correlated data. , 1996, Statistics in medicine.

[69]  R. F. Wagner,et al.  Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods. , 2004, Academic radiology.

[70]  Charles E. Metz,et al.  Contemporary issues for experimental design in assessment of medical imaging and computer-assist systems , 2003, SPIE Medical Imaging.

[71]  Susan A. Wood,et al.  Gold standards and expert panels: a pulmonary nodule case study with challenges and solutions , 2004, SPIE Medical Imaging.

[72]  R. F. Wagner,et al.  Continuous versus categorical data for ROC analysis: some quantitative considerations. , 2001, Academic radiology.

[73]  E. Hoffman,et al.  Lung image database consortium: developing a resource for the medical imaging research community. , 2004, Radiology.

[74]  Heang-Ping Chan,et al.  Multiple-reader studies, digital mammography, computer-aided diagnosis, and the Holy Grail of imaging physics: II , 2001, SPIE Medical Imaging.

[75]  R. F. Wagner,et al.  Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis. , 2000, Academic radiology.

[76]  Mitchell H. Gail,et al.  A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data , 1989 .

[77]  D P Chakraborty,et al.  Data analysis for detection and localization of multiple abnormalities with application to mammography. , 2000, Academic radiology.

[78]  C A Roe,et al.  Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation. , 1997, Academic radiology.

[79]  R F Wagner,et al.  Analysis of uncertainties in estimates of components of variance in multivariate ROC analysis. , 2001, Academic radiology.

[80]  Murray H. Loew,et al.  Assessing Classifiers from Two Independent Data Sets Using ROC Analysis: A Nonparametric Approach , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  R. F. Wagner,et al.  Assessment of medical imaging and computer-assist systems: lessons from recent experience. , 2002, Academic radiology.

[82]  C. D'Orsi,et al.  Diagnostic Performance of Digital Versus Film Mammography for Breast-Cancer Screening , 2005, The New England journal of medicine.

[83]  H. Ishwaran,et al.  A general class of hierarchical ordinal regression models with applications to correlated roc analysis , 2000 .

[84]  R. F. Wagner,et al.  The problem of ROC analysis without truth: the EM algorithm and the information matrix , 2000, Medical Imaging.

[85]  K. Doi,et al.  Effect of a computer-aided diagnosis scheme on radiologists' performance in detection of lung nodules on radiographs. , 1996, Radiology.

[86]  Dev P. Chakraborty,et al.  The FROC, AFROC and DROC Variants of the ROC Analysis , 2000 .

[87]  Robert M. Nishikawa,et al.  Variations in measured performance of CAD schemes due to database composition and scoring protocol , 1998, Medical Imaging.

[88]  Kevin S. Berbaum,et al.  A contaminated binormal model for ROC data , 2000 .

[89]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[90]  N A Obuchowski,et al.  Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. , 1995, Academic radiology.

[91]  Stephen L Hillis,et al.  Monte Carlo validation of the Dorfman-Berbaum-Metz method using normalized pseudovalues and less data-based model simplification. , 2005, Academic radiology.

[92]  Stephen L Hillis,et al.  Power estimation for the Dorfman-Berbaum-Metz method. , 2004, Academic radiology.

[93]  D P Chakraborty A search model and figure of merit for observer data acquired according to the free-response paradigm. , 2006, Physics in medicine and biology.

[94]  A. Toledano Three methods for analysing correlated ROC curves: a comparison in real data sets from multi‐reader, multi‐case studies with a factorial design , 2003, Statistics in medicine.

[95]  N A Obuchowski,et al.  Data analysis for detection and localization of multiple abnormalities with application to mammography. , 2000, Academic radiology.

[96]  C B Begg,et al.  Consensus Diagnoses and "Gold Standards" , 1990, Medical decision making : an international journal of the Society for Medical Decision Making.

[97]  K S Berbaum,et al.  Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design. , 1998, Academic radiology.

[98]  R. F. Wagner,et al.  Assessment methodologies and statistical issues for computer-aided diagnosis of lung nodules in computed tomography: contemporary research topics relevant to the lung image database consortium. , 2004, Academic radiology.

[99]  H E Rockette,et al.  The use of continuous and discrete confidence judgments in receiver operating characteristic studies of diagnostic imaging techniques. , 1992, Investigative radiology.

[100]  M Kallergi,et al.  Evaluating the performance of detection algorithms in digital mammography. , 1999, Medical physics.

[101]  X H Zhou,et al.  Correcting for verification bias in studies of a diagnostic test's accuracy , 1998, Statistical methods in medical research.

[102]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[103]  Eric Clarkson,et al.  A probabilistic model for the MRMC method, part 1: theoretical development. , 2006, Academic radiology.

[104]  Harold L. Kundel,et al.  Comparing observer performance with mixture distribution analysis when there is no external gold standard , 1998, Medical Imaging.

[105]  M. Bronskill,et al.  Receiver Operator characteristic (ROC) Analysis without Truth , 1990, Medical decision making : an international journal of the Society for Medical Decision Making.

[106]  James P. Egan,et al.  Operating Characteristics, Signal Detectability, and the Method of Free Response , 1961 .

[107]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[108]  C. Metz,et al.  Visual detection and localization of radiographic images. , 1975, Radiology.