Multireader, multicase receiver operating characteristic methodology: a bootstrap analysis.

RATIONALE AND OBJECTIVES We evaluated by bootstrapping the conclusions obtained by the Dorfman-Berbaum-Metz (DBM) receiver operating characteristic (ROC) method and by the Toledano-Gatsonis (TG) method on a well-known data set. METHODS We bootstrapped in two ways, resampled cases while holding readers fixed and resampled both cases and readers. RESULTS When an analysis of variance of pseudovalues implies that reader variance and all random interactions with treatment are essentially zero, then case-resampling bootstrap and the DBM and TG methods should give the same results. Case-resampling bootstrap and the DBM and TG methods did give highly similar results for both individual readers and the averages over all readers. Both the case-resampling bootstrap and the reader-case resampling bootstrap gave smaller standard errors for group than for individual reader means, thereby providing evidence for a trade-off of readers and cases with regard to precision and power in this data set. CONCLUSION Case-resampling bootstrap provides some justification for the DBM and TG methods.

[1]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[2]  S. R. Searle,et al.  Linear Models For Unbalanced Data , 1988 .

[3]  Robert Tibshirani,et al.  Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .

[4]  C. Metz,et al.  A New Approach for Testing the Significance of Differences Between ROC Curves Measured from Correlated Data , 1984 .

[5]  M. H. Quenouille Approximate Tests of Correlation in Time‐Series , 1949 .

[6]  S C Kao,et al.  Evaluation of a digital workstation for interpreting neonatal examinations. A receiver operating characteristic study. , 1992, Investigative radiology.

[7]  J. Hanley,et al.  Statistical Approaches to the Analysis of Receiver Operating Characteristic (ROC) Curves , 1984, Medical decision making : an international journal of the Society for Medical Decision Making.

[8]  B. Efron Bootstrap confidence intervals: Good or bad? , 1988 .

[9]  B. J. Winer Statistical Principles in Experimental Design , 1992 .

[10]  M. H. Quenouille NOTES ON BIAS IN ESTIMATION , 1956 .

[11]  H. Scheffé,et al.  The Analysis of Variance , 1960 .

[12]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[13]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[14]  Tse-Chi Hsu,et al.  The Effect of Limitations on the Number of Criterion Score Values on the Significance Level of theF-Test , 1969 .

[15]  B. Efron,et al.  The Jackknife: The Bootstrap and Other Resampling Plans. , 1983 .

[16]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[17]  C. Schwarz The mixed-model ANOVA: the truth, the computer packages, the books. Part I: balanced data , 1993 .

[18]  N A Obuchowski,et al.  Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. , 1995, Academic radiology.

[19]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[20]  S. R. Searle Linear Models , 1971 .

[21]  O. D. Duncan,et al.  Linear statistical models and related methods , 1984 .