Performance Studies for Validation of CAD Systems

Evaluation of computational intelligence (CI) systems designed to improve the performance of a human operator is complicated by the necessity of including the effect of human variability. In this paper we examine the methodology available for addressing this variability within the context of medical imaging computer-assisted diagnosis (CAD) systems. We present a review of currently available techniques and give an example using computer simulation. It is shown how advanced statistical techniques lead to more efficient measures of CAD performance.

[1]  C E Metz,et al.  Variance-component modeling in the analysis of receiver operating characteristic index estimates. , 1997, Academic radiology.

[2]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[3]  Nancy A Obuchowski,et al.  A comparison of the Dorfman–Berbaum–Metz and Obuchowski–Rockette methods for receiver operating characteristic (ROC) data , 2005, Statistics in medicine.

[4]  Brandon D Gallas,et al.  One-shot estimate of MRMC variance: AUC. , 2006, Academic radiology.

[5]  R. F. Wagner,et al.  A Framework for Random-Effects ROC Analysis: Biases with the Bootstrap and Other Variance Estimators , 2009 .

[6]  R. F. Wagner,et al.  Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis. , 2000, Academic radiology.

[7]  Eric Clarkson,et al.  A probabilistic model for the MRMC method, part 1: theoretical development. , 2006, Academic radiology.

[8]  P E Shile,et al.  Variability in the interpretation of screening mammograms by US radiologists. , 1996, Academic radiology.

[9]  Matthew A. Kupinski,et al.  Probabilistic foundations of the MRMC method , 2005, SPIE Medical Imaging.

[10]  Eric Clarkson,et al.  A probabilistic model for the MRMC method, part 2: validation and applications. , 2006, Academic radiology.

[11]  Andriy I. Bandos,et al.  Exact Bootstrap Variances of the Area Under ROC Curve , 2007 .

[12]  C A Roe,et al.  Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation. , 1997, Academic radiology.

[13]  N. Obuchowski,et al.  Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: An anova approach with dependent observations , 1995 .

[14]  David G. Brown,et al.  Effects of finite sample size and correlated/noisy input features on neural network pattern classification , 1994, Medical Imaging.

[15]  Murray H. Loew,et al.  Assessing Classifiers from Two Independent Data Sets Using ROC Analysis: A Nonparametric Approach , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  R. F. Wagner,et al.  Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods. , 2004, Academic radiology.

[17]  K S Berbaum,et al.  Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design. , 1998, Academic radiology.

[18]  C. Beam,et al.  Variability in the interpretation of screening mammograms by US radiologists. Findings from a national sample. , 1996, Archives of internal medicine.

[19]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.