Multireader multicase reader studies with binary agreement data: simulation, analysis, validation, and sizing

Abstract. We treat multireader multicase (MRMC) reader studies for which a reader’s diagnostic assessment is converted to binary agreement (1: agree with the truth state, 0: disagree with the truth state). We present a mathematical model for simulating binary MRMC data with a desired correlation structure across readers, cases, and two modalities, assuming the expected probability of agreement is equal for the two modalities (P1=P2). This model can be used to validate the coverage probabilities of 95% confidence intervals (of P1, P2, or P1−P2 when P1−P2=0), validate the type I error of a superiority hypothesis test, and size a noninferiority hypothesis test (which assumes P1=P2). To illustrate the utility of our simulation model, we adapt the Obuchowski–Rockette–Hillis (ORH) method for the analysis of MRMC binary agreement data. Moreover, we use our simulation model to validate the ORH method for binary data and to illustrate sizing in a noninferiority setting. Our software package is publicly available on the Google code project hosting site for use in simulation, analysis, validation, and sizing of MRMC reader studies with binary agreement data.

[1]  Yukako Yagi,et al.  Primary histologic diagnosis using automated whole slide imaging: a validation study , 2006, BMC clinical pathology.

[2]  Berkman Sahiner,et al.  Hypothesis testing in noninferiority and equivalence MRMC ROC studies. , 2012, Academic radiology.

[3]  R. Newcombe,et al.  Interval estimation for the difference between independent proportions: comparison of eleven methods. , 1998, Statistics in medicine.

[4]  Brandon D Gallas,et al.  One-shot estimate of MRMC variance: AUC. , 2006, Academic radiology.

[5]  Stephen L Hillis,et al.  A marginal‐mean ANOVA approach for analyzing multireader multicase radiological imaging data , 2014, Statistics in medicine.

[6]  Gisella Gennaro,et al.  Digital breast tomosynthesis versus digital mammography: a clinical performance study , 2010, European Radiology.

[7]  Xiao Song,et al.  A marginal model approach for analysis of multi-reader multi-test receiver operating characteristic (ROC) data. , 2005, Biostatistics.

[8]  M. Giger,et al.  Improving breast cancer diagnosis with computer-aided diagnosis. , 1999, Academic radiology.

[9]  N. Obuchowski,et al.  Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: An anova approach with dependent observations , 1995 .

[10]  Drazen Jukic,et al.  Evaluation of 2 whole-slide imaging applications in dermatopathology. , 2008, Human pathology.

[11]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[12]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[13]  Nancy A Obuchowski,et al.  Multi-reader ROC studies with split-plot designs: a comparison of statistical methods. , 2012, Academic radiology.

[14]  Stephen L Hillis,et al.  Simulation of unequal-variance binormal multireader ROC decision data: an extension of the Roe and Metz simulation model. , 2012, Academic radiology.

[15]  Kyle J Myers,et al.  Evaluating imaging and computer-aided detection and diagnosis devices at the FDA. , 2012, Academic radiology.

[16]  C A Roe,et al.  Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation. , 1997, Academic radiology.

[17]  Brandon D Gallas,et al.  Generalized Roe and Metz receiver operating characteristic model: analytic link between simulated decision scores and empirical AUC variances and covariances , 2014, Journal of medical imaging.

[18]  R. F. Wagner,et al.  Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis. , 2000, Academic radiology.

[19]  A. Renshaw,et al.  Agreement and error rates using blinded review to evaluate surgical pathology of biopsy material. , 2003, American journal of clinical pathology.

[20]  S. R. Searle,et al.  Generalized, Linear, and Mixed Models , 2005 .

[21]  R. F. Wagner,et al.  Assessment of medical imaging systems and computer aids: a tutorial review. , 2007, Academic radiology.

[22]  R. F. Wagner,et al.  Assessment of medical imaging and computer-assist systems: lessons from recent experience. , 2002, Academic radiology.

[23]  Walter H Henricks,et al.  Validation of whole slide imaging for primary diagnosis in surgical pathology. , 2013, Archives of pathology & laboratory medicine.

[24]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[25]  Nancy A. Obuchowski,et al.  Power estimation for multireader ROC methods an updated and unified approach. , 2011, Academic radiology.

[26]  Kyle J Myers,et al.  Multireader multicase variance analysis for binary data. , 2007, Journal of the Optical Society of America. A, Optics, image science, and vision.

[27]  Stephen L Hillis,et al.  Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis. , 2008, Academic radiology.

[28]  R. F. Wagner,et al.  A Framework for Random-Effects ROC Analysis: Biases with the Bootstrap and Other Variance Estimators , 2009 .

[29]  C E Metz,et al.  Variance-component modeling in the analysis of receiver operating characteristic index estimates. , 1997, Academic radiology.

[30]  S. Hillis A comparison of denominator degrees of freedom methods for multiple observer ROC analysis , 2007, Statistics in medicine.

[31]  M. Piedmonte,et al.  A Method for Generating High-Dimensional Multivariate Binary Variates , 1991 .

[32]  Savitri Krishnamurthy,et al.  Multi-institutional comparison of whole slide digital imaging and optical microscopy for interpretation of hematoxylin-eosin-stained breast tissue sections. , 2013, Archives of pathology & laboratory medicine.