Equivalence of improvement in area under ROC curve and linear discriminant analysis coefficient under assumption of normality

In this paper we investigate the addition of new variables to an existing risk prediction model and the subsequent impact on discrimination quantified by the area under the receiver operating characteristics curve (AUC of ROC). Based on practical experience, concerns have emerged that the significance of association of the variable under study with the outcome in the risk model does not correspond to the significance of the change in AUC: that is, often the variable is significant, but the change in AUC is not. This paper demonstrates that under the assumption of multivariate normality and employing linear discriminant analysis (LDA) to construct the risk prediction tool, statistical significance of the new predictor(s) is equivalent to the statistical significance of the increase in AUC. Under these assumptions the result extends asymptotically to logistic regression. We further show that equality of variance-covariance matrices of predictors within cases and non-cases is not necessary when LDA is used. However, our practical example from the Framingham Heart Study data suggests that the finding might be sensitive to the assumption of normality.

[1]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[2]  P. Hall,et al.  BOOTSTRAP HYPOTHESIS TESTING. AUTHOR'S REPLY , 1992 .

[3]  M. Gail,et al.  Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. , 1989, Journal of the National Cancer Institute.

[4]  Susan R. Wilson,et al.  Two guidelines for bootstrap hypothesis testing , 1991 .

[5]  D. Levy,et al.  Multiple biomarkers for the prediction of first major cardiovascular events and death. , 2006, The New England journal of medicine.

[6]  Discriminant analysis under elliptical populations , 1994 .

[7]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[8]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[9]  B. Efron The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .

[10]  Melissa Bondy,et al.  Projecting individualized absolute invasive breast cancer risk in African American women. , 2007, Journal of the National Cancer Institute.

[11]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[12]  D. Levy,et al.  Prediction of coronary heart disease using risk factor categories. , 1998, Circulation.

[13]  K. Anderson,et al.  Cardiovascular disease risk profiles. , 1991, American heart journal.

[14]  J. Ware The limitations of risk factors as prognostic tools. , 2006, The New England journal of medicine.

[15]  E. S. Pearson,et al.  Tests for departure from normality: Comparison of powers , 1977 .

[16]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[17]  M. Pencina,et al.  General Cardiovascular Risk Profile for Use in Primary Care: The Framingham Heart Study , 2008, Circulation.