Asymptotic distribution of ∆AUC, NRIs, and IDI based on theory of U‐statistics

The change in area under the curve (∆AUC), the integrated discrimination improvement (IDI), and net reclassification index (NRI) are commonly used measures of risk prediction model performance. Some authors have reported good validity of associated methods of estimating their standard errors (SE) and construction of confidence intervals, whereas others have questioned their performance. To address these issues, we unite the ∆AUC, IDI, and three versions of the NRI under the umbrella of the U-statistics family. We rigorously show that the asymptotic behavior of ∆AUC, NRIs, and IDI fits the asymptotic distribution theory developed for U-statistics. We prove that the ∆AUC, NRIs, and IDI are asymptotically normal, unless they compare nested models under the null hypothesis. In the latter case, asymptotic normality and existing SE estimates cannot be applied to ∆AUC, NRIs, or IDI. In the former case, SE formulas proposed in the literature are equivalent to SE formulas obtained from U-statistics theory if we ignore adjustment for estimated parameters. We use Sukhatme-Randles-deWet condition to determine when adjustment for estimated parameters is necessary. We show that adjustment is not necessary for SEs of the ∆AUC and two versions of the NRI when added predictor variables are significant and normally distributed. The SEs of the IDI and three-category NRI should always be adjusted for estimated parameters. These results allow us to define when existing formulas for SE estimates can be used and when resampling methods such as the bootstrap should be used instead when comparing nested models. We also use the U-statistic theory to develop a new SE estimate of ∆AUC. Copyright © 2017 John Wiley & Sons, Ltd.

[1]  B. Efron The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .

[2]  Ewout W Steyerberg,et al.  Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers , 2011, Statistics in medicine.

[3]  Holly Janes,et al.  Methods for Evaluating Prediction Performance of Biomarkers and Tests , 2013 .

[4]  Michael J Pencina,et al.  Net reclassification index at event rate: properties and relationships , 2017, Statistics in medicine.

[5]  Ralph B D'Agostino,et al.  Misuse of DeLong test to compare AUCs for nested models , 2012, Statistics in medicine.

[6]  K. Singh,et al.  ON ONE TERM EDGEWORTH CORRECTION BY EFRON'S BOOTSTRAP , 2016 .

[7]  F. Harrell,et al.  Regression modelling strategies for improved prognostic prediction. , 1984, Statistics in medicine.

[8]  Michael J Pencina,et al.  RE: net risk reclassification P Values: valid or misleading? , 2014, Journal of the National Cancer Institute.

[9]  Kathleen F. Kerr,et al.  Testing for improvement in prediction model performance , 2013, Statistics in medicine.

[10]  M. Pencina,et al.  What to expect from net reclassification improvement with three categories , 2014, Statistics in medicine.

[11]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[12]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .

[13]  K. Singh,et al.  On the Asymptotic Accuracy of Efron's Bootstrap , 1981 .

[14]  Thomas Lumley,et al.  American Journal of Epidemiology Practice of Epidemiology Evaluating the Incremental Value of New Biomarkers with Integrated Discrimination Improvement , 2022 .

[15]  D. Levy,et al.  Prediction of coronary heart disease using risk factor categories. , 1998, Circulation.

[16]  Hon-Cheong So,et al.  A Unifying Framework for Evaluating the Predictive Power of Genetic Variants Based on the Level of Heritability Explained , 2010, PLoS genetics.

[17]  Ronald H. Randles,et al.  On the Effect of Substituting Parameter Estimators in Limiting $\chi^2 U$ and $V$ Statistics , 1987 .

[18]  Nancy R Cook,et al.  A Bias-Corrected Net Reclassification Improvement for Clinical Subgroups , 2013, Medical decision making : an international journal of the Society for Medical Decision Making.

[19]  Alan J. Lee,et al.  U-Statistics: Theory and Practice , 1990 .

[20]  Mei-Ling Ting Lee,et al.  The average area under correlated receiver operating characteristic curves : a nonparametric approach based on generalized two-sample Wilcoxon statistics , 2001 .

[21]  Olga V. Demler,et al.  Equivalence of improvement in area under ROC curve and linear discriminant analysis coefficient under assumption of normality , 2011, Statistics in medicine.

[22]  J. Cima,et al.  On weak* convergence in ¹ , 1996 .

[23]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .

[24]  Jennifer G. Robinson,et al.  2013 ACC/AHA guideline on the treatment of blood cholesterol to reduce atherosclerotic cardiovascular risk in adults: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. , 2014, Circulation.

[25]  V. Seshan,et al.  Comparing ROC curves derived from regression models , 2013, Statistics in medicine.

[26]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[27]  J. Mckenney,et al.  Executive Summary of The Third Report of The National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, And Treatment of High Blood Cholesterol In Adults (Adult Treatment Panel III). , 2001, JAMA.

[28]  M. Gail,et al.  Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. , 1989, Journal of the National Cancer Institute.

[29]  Ronald H. Randles,et al.  On the Asymptotic Normality of Statistics with Estimated Parameters , 1982 .

[30]  Jun S. Liu,et al.  Linear Combinations of Multiple Diagnostic Markers , 1993 .

[31]  J. Klein,et al.  Small sample moments of some estimators of the variance of the Kaplan−Meier and Nelson-Aalen estimators , 1991 .

[32]  Kathleen F. Kerr,et al.  Net reclassification indices for evaluating risk prediction instruments: a critical review. , 2014, Epidemiology.

[33]  E. L. Lehmann,et al.  Consistency and Unbiasedness of Certain Nonparametric Tests , 1951 .

[34]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[35]  Michael J Pencina,et al.  Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models , 2012, Statistics in medicine.

[36]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[37]  Michael J Pencina,et al.  Discrimination slope and integrated discrimination improvement – properties, relationships and impact of calibration , 2017, Statistics in medicine.

[38]  M. Pencina,et al.  General Cardiovascular Risk Profile for Use in Primary Care: The Framingham Heart Study , 2008, Circulation.

[39]  Laura Antolini,et al.  Inference on Correlated Discrimination Measures in Survival Analysis: A Nonparametric Approach , 2004 .

[40]  B. V. Sukhatme Testing the Hypothesis That Two Populations Differ Only in Location , 1958 .

[41]  L. Perdue,et al.  Aspirin Use to Prevent Cardiovascular Disease and Colorectal Cancer: Updated Modeling Study for the US Preventive Services Task Force. , 2022, JAMA.

[42]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .