Hypothesis testing in noninferiority and equivalence MRMC ROC studies.

RATIONALE AND OBJECTIVES Conventional multireader multicase receiver operating characteristic (MRMC ROC) methodologies use hypothesis testing to test differences in diagnostic accuracies among several imaging modalities. The general MRMC-ROC analysis framework is designed to show that one modality is statistically different among a set of competing modalities (ie, the superiority setting). In practice, one may wish to show that the diagnostic accuracy of a modality is noninferior or equivalent, in a statistical sense, to that of another modality instead of showing its superiority (a higher bar). The purpose of this article is to investigate the appropriate adjustments to the conventional MRMC ROC hypothesis testing methodology for the design and analysis of noninferiority and equivalence hypothesis tests. MATERIALS AND METHODS We present three methodological adjustments to the updated and unified Obuchowski-Rockette (OR)/Dorfman-Berbaum-Metz (DBM) MRMC ROC method for use in statistical noninferiority/equivalence testing: 1) the appropriate statement of the null and alternative hypotheses; 2) a method for analyzing the experimental data; and 3) a method for sizing MRMC noninferiority/equivalence studies. We provide a clinical example to further illustrate the analysis of and sizing/power calculation for noninferiority MRMC ROC studies and give some insights on the interplay of effect size, noninferiority margin parameter, and sample sizes. RESULTS We provide detailed analysis and sizing computation procedures for a noninferiority MRMC ROC study using our method adjusted from the updated and unified OR/DBM MRMC method. Likewise, we show that an equivalence hypothesis test is identical to performing two simultaneous noninferiority tests (ie, either modality is noninferior to the other). CONCLUSION Conventional MRMC ROC methodology developed for superiority studies can and should be adjusted appropriately for the design and analysis of a noninferiority/equivalence hypothesis testing. In addition, the confidence interval of the difference in diagnostic accuracies is important information and should generally accompany the statistical analysis and any conclusions drawn from the hypothesis testing.

[1]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[2]  Gisella Gennaro,et al.  Digital breast tomosynthesis versus digital mammography: a clinical performance study , 2010, European Radiology.

[3]  C E Metz,et al.  Quantification of failure to demonstrate statistical significance. The usefulness of confidence intervals. , 1993, Investigative radiology.

[4]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[5]  David G. Brown,et al.  Reader studies for validation of CAD systems , 2008, Neural Networks.

[6]  C E Metz,et al.  The "proper" binormal model: parametric receiver operating characteristic curve estimation with degenerate data. , 1997, Academic radiology.

[7]  W C Blackwelder,et al.  "Proving the null hypothesis" in clinical trials. , 1981, Controlled clinical trials.

[8]  N. Obuchowski,et al.  Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: An anova approach with dependent observations , 1995 .

[9]  R. F. Wagner,et al.  Assessment of medical imaging systems and computer aids: a tutorial review. , 2007, Academic radiology.

[10]  Harry K Genant,et al.  On the non‐inferiority of a diagnostic test based on paired observations , 2003, Statistics in medicine.

[11]  Nancy A Obuchowski,et al.  Reducing the number of reader interpretations in MRMC studies. , 2009, Academic radiology.

[12]  S. Hillis A comparison of denominator degrees of freedom methods for multiple observer ROC analysis , 2007, Statistics in medicine.

[13]  Kung-Jong Lui,et al.  Testing non‐inferiority (and equivalence) between two diagnostic procedures in paired‐sample ordinal data , 2004, Statistics in medicine.

[14]  Jen‐pei Liu,et al.  Tests of equivalence and non‐inferiority for diagnostic accuracy based on the paired areas under ROC curves , 2006, Statistics in medicine.

[15]  Nancy A. Obuchowski,et al.  Power estimation for multireader ROC methods an updated and unified approach. , 2011, Academic radiology.

[16]  R Fisher,et al.  Design of Experiments , 1936 .

[17]  Stephen L Hillis,et al.  Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis. , 2008, Academic radiology.

[18]  C. Metz,et al.  "Proper" Binormal ROC Curves: Theory and Maximum-Likelihood Estimation. , 1999, Journal of mathematical psychology.

[19]  J. Bartko,et al.  Proving the null hypothesis. , 1991 .

[20]  Chen-Tuo Liao,et al.  A non‐inferiority test for diagnostic accuracy based on the paired partial areas under ROC curves , 2008, Statistics in medicine.

[21]  Nancy A Obuchowski,et al.  A comparison of the Dorfman–Berbaum–Metz and Obuchowski–Rockette methods for receiver operating characteristic (ROC) data , 2005, Statistics in medicine.