Reader studies for validation of CAD systems

Evaluation of computational intelligence (CI) systems designed to improve the performance of a human operator is complicated by the need to include the effect of human variability. In this paper we consider human (reader) variability in the context of medical imaging computer-assisted diagnosis (CAD) systems, and we outline how to compare the detection performance of readers with and without the CAD. An effective and statistically powerful comparison can be accomplished with a receiver operating characteristic (ROC) experiment, summarized by the reader-averaged area under the ROC curve (AUC). The comparison requires sophisticated yet well-developed methods for multi-reader multi-case (MRMC) variance analysis. MRMC variance analysis accounts for random readers, random cases, and correlations in the experiment. In this paper, we extend the methods available for estimating this variability. Specifically, we present a method that can treat arbitrary study designs. Most methods treat only the fully-crossed study design, where every reader reads every case in two experimental conditions. We demonstrate our method with a computer simulation, and we assess the statistical power of a variety of study designs.

[1]  Murray H. Loew,et al.  Estimating the uncertainty in the estimated mean area under the ROC curve of a classifier , 2005, Pattern Recognit. Lett..

[2]  H H Barrett,et al.  Human- and model-observer performance in ramp-spectrum noise: effects of regularization and object variability. , 2001, Journal of the Optical Society of America. A, Optics, image science, and vision.

[3]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[4]  H. V. Trees Detection, Estimation, And Modulation Theory , 2001 .

[5]  C E Metz,et al.  Variance-component modeling in the analysis of receiver operating characteristic index estimates. , 1997, Academic radiology.

[6]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[7]  Margaret Sullivan Pepe,et al.  Distribution-free ROC analysis using binary regression techniques. , 2002, Biostatistics.

[8]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[9]  R. Warnke,et al.  Immune signatures in follicular lymphoma. , 2005, The New England journal of medicine.

[10]  Kyle J Myers,et al.  Multireader multicase variance analysis for binary data. , 2007, Journal of the Optical Society of America. A, Optics, image science, and vision.

[11]  C. Beam,et al.  Variability in the interpretation of screening mammograms by US radiologists. Findings from a national sample. , 1996, Archives of internal medicine.

[12]  N A Obuchowski,et al.  Multireader receiver operating characteristic studies: a comparison of study designs. , 1995, Academic radiology.

[13]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[14]  Lori E. Dodd,et al.  Semiparametric Regression for the Area Under the Receiver Operating Characteristic Curve , 2003 .

[15]  A E Burgess,et al.  The Rose model, revisited. , 1999, Journal of the Optical Society of America. A, Optics, image science, and vision.

[16]  Timothy D Johnson,et al.  A Bayesian hierarchical approach to multirater correlated ROC analysis. , 2006, Statistics in medicine.

[17]  C B Begg,et al.  A General Regression Methodology for ROC Curve Estimation , 1988, Medical decision making : an international journal of the Society for Medical Decision Making.

[18]  R. F. Wagner,et al.  Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis. , 2000, Academic radiology.

[19]  C A Roe,et al.  Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation. , 1997, Academic radiology.

[20]  Eric Clarkson,et al.  A probabilistic model for the MRMC method, part 1: theoretical development. , 2006, Academic radiology.

[21]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[22]  Pranab Kumar Sen,et al.  On Some Convergence Properties of UStatistics , 1960 .

[23]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[24]  R. F. Wagner,et al.  Classifier design for computer-aided diagnosis: effects of finite sample size on the mean performance of classical and neural network classifiers. , 1999, Medical physics.

[25]  Murray H. Loew,et al.  Assessing Classifiers from Two Independent Data Sets Using ROC Analysis: A Nonparametric Approach , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  H. Ishwaran,et al.  A general class of hierarchical ordinal regression models with applications to correlated roc analysis , 2000 .

[27]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[28]  David G. Stork,et al.  Pattern Classification , 1973 .

[29]  David G. Brown,et al.  Performance Studies for Validation of CAD Systems , 2007, 2007 International Joint Conference on Neural Networks.

[30]  K S Berbaum,et al.  Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design. , 1998, Academic radiology.

[31]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[32]  Craig A. Beam,et al.  Variability in the interpretation of screening mammograms by US radiologists. Findings from a national sample. , 1996, Archives of internal medicine.

[33]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .

[34]  X H Zhou,et al.  A simple method for comparing correlated ROC curves using incomplete data. , 1996, Statistics in medicine.

[35]  Miguel P. Eckstein,et al.  The Perception of Medical Images: 1941-2001 , 2001 .

[36]  Brandon D Gallas,et al.  One-shot estimate of MRMC variance: AUC. , 2006, Academic radiology.

[37]  Matthew A. Kupinski,et al.  Probabilistic foundations of the MRMC method , 2005, SPIE Medical Imaging.

[38]  R. F. Wagner,et al.  Assessment of medical imaging systems and computer aids: a tutorial review. , 2007, Academic radiology.

[39]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[40]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[41]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[42]  N. Obuchowski,et al.  Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: An anova approach with dependent observations , 1995 .

[43]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[44]  R. F. Wagner,et al.  Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods. , 2004, Academic radiology.

[45]  C. Metz ROC Methodology in Radiologic Imaging , 1986, Investigative radiology.

[46]  C A Gatsonis,et al.  Regression analysis of correlated receiver operating characteristic data. , 1995, Academic radiology.

[47]  Xiao Song,et al.  A marginal model approach for analysis of multi-reader multi-test receiver operating characteristic (ROC) data. , 2005, Biostatistics.

[48]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .