Tests of equivalence and non‐inferiority for diagnostic accuracy based on the paired areas under ROC curves

Assessment of equivalence or non‐inferiority in accuracy between two diagnostic procedures often involves comparisons of paired areas under the receiver operating characteristic (ROC) curves. With some pre‐specified clinically meaningful limits, the current approach to evaluating equivalence is to perform the two one‐sided tests (TOST) based on the difference in paired areas under ROC curves estimated by the non‐parametric method. We propose to use the standardized difference for assessing equivalence or non‐inferiority in diagnostic accuracy based on paired areas under ROC curves between two diagnostic procedures. The bootstrap technique is also suggested for both non‐parametric method and the standardized difference approach. A simulation study was conducted empirically to investigate the size and power of the four methods for various combinations of distributions, data types, sample sizes, and different correlations. Simulation results demonstrate that the bootstrap procedure of the standardized difference approach not only can adequately control the type I error rate at the nominal level but also provides equivalent power under both symmetrical and skewed distributions. A numerical example using published data illustrates the proposed methods. Copyright © 2005 John Wiley & Sons, Ltd.

[1]  Pranab Kumar Sen,et al.  On Some Convergence Properties of UStatistics , 1960 .

[2]  P. Moran,et al.  Testing for correlation between non-negative variates. , 1967, Biometrika.

[3]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[4]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[5]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[6]  Roger L. Berger,et al.  Multiparameter Hypothesis Testing and Acceptance Sampling , 1982 .

[7]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[8]  Irwin Guttman,et al.  Statistical inference for Pr(Y < X): The normal case , 1986 .

[9]  Robert Tibshirani,et al.  Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .

[10]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[11]  T J Masaryk,et al.  3DFT MR angiography of the carotid bifurcation: potential and limitations as a screening examination. , 1991, Radiology.

[12]  W. Hauck,et al.  Types of bioequivalence and related statistical considerations. , 1992, International journal of clinical pharmacology, therapy, and toxicology.

[13]  R. Tibshirani,et al.  Confidence intervals based on bootstrap percentiles , 1993 .

[14]  N. Obuchowski Testing for equivalence of diagnostic tests. , 1997, AJR. American journal of roentgenology.

[15]  M L Chen Individual bioequivalence--a regulatory update. , 1997, Journal of biopharmaceutical statistics.

[16]  N A Obuchowski,et al.  Nonparametric analysis of clustered ROC curve data. , 1997, Biometrics.

[17]  Statistical evaluation of individual bioequivalence , 1998 .

[18]  N A Obuchowski,et al.  Film-screen versus digitized mammography: assessment of clinical equivalence. , 1999, AJR. American journal of roentgenology.

[19]  T Hyslop,et al.  Generalized treatment effects for clinical trials. , 2000, Statistics in medicine.

[20]  N. Obuchowski Can electronic medical images replace hard‐copy film? Defining and testing the equivalence of diagnostic tests , 2001, Statistics in medicine.

[21]  Unconditional Exact Tests for Equivalence or Noninferiority for Paired Binary Endpoints , 2001, Biometrics.

[22]  Xiao-Hua Zhou,et al.  Statistical Methods in Diagnostic Medicine , 2002 .

[23]  Huey-miin Hsueh,et al.  Tests for equivalence or non‐inferiority for paired binary data , 2002, Statistics in medicine.

[24]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .

[25]  N. Obuchowski Receiver operating characteristic curves and their use in radiology. , 2003, Radiology.

[26]  Man-Lai Tang,et al.  On tests of equivalence via non‐unity relative risk for matched‐pair design , 2003, Statistics in medicine.

[27]  Elizabeth R DeLong,et al.  ROC methodology within a monitoring framework , 2003, Statistics in medicine.

[28]  R. Simon,et al.  Evaluating the Efficiency of Targeted Designs for Randomized Clinical Trials , 2004, Clinical Cancer Research.

[29]  R. Simon,et al.  On the e ciency of targeted clinical trials , 2004 .

[30]  Donald J. Schuirmann A comparison of the Two One-Sided Tests Procedure and the Power Approach for assessing the equivalence of average bioavailability , 1987, Journal of Pharmacokinetics and Biopharmaceutics.

[31]  P. Qiu The Statistical Evaluation of Medical Tests for Classification and Prediction , 2005 .

[32]  R. Simon,et al.  On the efficiency of targeted clinical trials , 2005, Statistics in medicine.

[33]  Walter W. Hauck,et al.  Consideration of individual bioequivalence , 1990, Journal of Pharmacokinetics and Biopharmaceutics.

[34]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .