Validation and statistical power comparison of methods for analyzing free-response observer performance studies.

RATIONALE AND OBJECTIVES The aim of this work was to validate and compare the statistical powers of proposed methods for analyzing free-response data using a search-model-based simulator. MATERIALS AND METHODS A free-response data simulator is described that can model a single reader interpreting the same cases in two modalities, or two computer-aided detection (CAD) algorithms, or two human observers, interpreting the same cases in one modality. A variance components model, analogous to the Roe and Metz receiver-operating characteristic (ROC) data simulator, is described; it models intracase and intermodality correlations in free-response studies. Two generic observers were simulated: a quasi-human observer and a quasi-CAD algorithm. Null hypothesis (NH) validity and statistical powers of ROC, jackknife alternative free-response operating characteristic (JAFROC), a variant of JAFROC termed JAFROC-1, initial detection and candidate analysis (IDCA), and a nonparametric (NP) approach were investigated. RESULTS All methods had valid NH behavior over a wide range of simulator parameters. For equal numbers of normal and abnormal cases, for the human observer, the statistical power ranking of the methods was JAFROC-1 > JAFROC > (IDCA approximately NP) > ROC. For the CAD algorithm, the ranking was (NP approximately IDCA) > (JAFROC-1 approximately JAFROC) > ROC. In either case, the statistical power of the highest ranked method exceeded that of the lowest ranked method by about a factor of two. Dependence of statistical power on simulator parameters followed expected trends. For data sets with more abnormal cases than normal cases, JAFROC-1 power significantly exceeded JAFROC power. CONCLUSION Based on this work, the recommendation is to use JAFROC-1 for human observers (including human observers with CAD assist) and the NP method for evaluating CAD algorithms.

[1]  D. Chakraborty,et al.  Free-response methodology: alternate analysis and a new observer-performance experiment. , 1990, Radiology.

[2]  B. Efron,et al.  The Jackknife: The Bootstrap and Other Resampling Plans. , 1983 .

[3]  Xiao-Hua Zhou,et al.  Statistical Methods in Diagnostic Medicine , 2002 .

[4]  Dev P Chakraborty,et al.  Observer studies involving detection and localization: modeling, analysis, and validation. , 2004, Medical physics.

[5]  Darrin C. Edwards,et al.  Maximum likelihood fitting of FROC curves under an initial-detection-and-candidate-analysis model. , 2002, Medical physics.

[6]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[7]  Kevin S. Berbaum,et al.  A contaminated binormal model for ROC data , 2000 .

[8]  C. Metz,et al.  "Proper" Binormal ROC Curves: Theory and Maximum-Likelihood Estimation. , 1999, Journal of mathematical psychology.

[9]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[10]  K. Berbaum,et al.  Proper receiver operating characteristic analysis: the bigamma model. , 1997, Academic radiology.

[11]  H L Kundel,et al.  A visual concept shapes image perception. , 1983, Radiology.

[12]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[13]  Charles E Metz,et al.  Receiver operating characteristic analysis: a tool for the quantitative evaluation of observer performance and imaging systems. , 2006, Journal of the American College of Radiology : JACR.

[14]  R. Swensson Unified measurement of observer performance in detecting and localizing target objects on images. , 1996, Medical physics.

[15]  D P Chakraborty,et al.  Maximum likelihood analysis of free-response receiver operating characteristic (FROC) data. , 1989, Medical physics.

[16]  Berkman Sahiner,et al.  Evaluating computer-aided detection algorithms. , 2007, Medical physics.

[17]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[18]  John F. Hamilton,et al.  A Free Response Approach To The Measurement And Characterization Of Radiographic Observer Performance , 1977, Other Conferences.

[19]  C E Metz,et al.  Variance-component modeling in the analysis of receiver operating characteristic index estimates. , 1997, Academic radiology.

[20]  C. Metz ROC Methodology in Radiologic Imaging , 1986, Investigative radiology.

[21]  Frank W. Samuelson,et al.  Comparing image detection algorithms using resampling , 2006, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006..

[22]  C A Roe,et al.  Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation. , 1997, Academic radiology.

[23]  K S Berbaum,et al.  A contaminated binormal model for ROC data: Part III. Initial evaluation with detection ROC data. , 2000, Academic radiology.

[24]  D. Chakraborty ROC curves predicted by a model of visual search , 2006, Physics in medicine and biology.

[25]  Lorenzo L. Pesce,et al.  Reliable and computationally efficient maximum-likelihood estimation of "proper" binormal ROC curves. , 2007, Academic radiology.

[26]  M. E. Galassi,et al.  GNU SCIENTI C LIBRARY REFERENCE MANUAL , 2005 .

[27]  Frank W. Samuelson,et al.  ADVANTAGES AND EXAMPLES OF RESAMPLING FOR CAD EVALUATION , 2007, 2007 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[28]  C E Metz,et al.  The "proper" binormal model: parametric receiver operating characteristic curve estimation with degenerate data. , 1997, Academic radiology.

[29]  Hong-Jun Yoon,et al.  Operating characteristics predicted by models for diagnostic tasks involving lesion localization. , 2008, Medical physics.

[30]  James P. Egan,et al.  Operating Characteristics, Signal Detectability, and the Method of Free Response , 1961 .

[31]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[32]  Harold L. Kundel,et al.  Modeling visual search during mammogram viewing , 2004, SPIE Medical Imaging.

[33]  E. Conant,et al.  Holistic component of image perception in mammogram interpretation: gaze-tracking study. , 2007, Radiology.

[34]  D P Chakraborty A search model and figure of merit for observer data acquired according to the free-response paradigm. , 2006, Physics in medicine and biology.

[35]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .