Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable

BackgroundWhen outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model.MethodsAn analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition.ResultsUnder the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition.ConclusionsThe discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.

[1]  Patrick Royston,et al.  Visualizing and assessing discrimination in the logistic regression model , 2010, Statistics in medicine.

[2]  Martha Sajatovic,et al.  Clinical Prediction Models , 2013 .

[3]  E. Steyerberg,et al.  [Regression modeling strategies]. , 2011, Revista espanola de cardiologia.

[4]  P. Royston,et al.  Model-based screening by risk with application to Down's syndrome. , 1992, Statistics in medicine.

[5]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[6]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[7]  Yvonne Vergouwe,et al.  External validity of risk models: Use of benchmark values to disentangle a case-mix effect from incorrect coefficients. , 2010, American journal of epidemiology.

[8]  Ewout W Steyerberg,et al.  The impact of genotype frequencies on the clinical validity of genomic profiling for predicting common chronic diseases , 2007, Genetics in Medicine.

[9]  P D Cleary,et al.  Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores. , 2001, Journal of clinical epidemiology.

[10]  N. Obuchowski,et al.  Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures , 2010, Epidemiology.

[11]  Xiao-Hua Zhou,et al.  Statistical Methods in Diagnostic Medicine , 2002 .

[12]  References , 1971 .

[13]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[14]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[15]  P. Austin Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples , 2009, Statistics in medicine.

[16]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[17]  E. Steyerberg Clinical Prediction Models , 2008, Statistics for Biology and Health.

[18]  Olga V. Demler,et al.  Equivalence of improvement in area under ROC curve and linear discriminant analysis coefficient under assumption of normality , 2011, Statistics in medicine.

[19]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[20]  H. Riedwyl,et al.  Standard Distance in Univariate and Multivariate Analysis , 1986 .

[21]  M. Pepe,et al.  Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. , 2004, American journal of epidemiology.

[22]  Peter C Austin,et al.  Effectiveness of public report cards for improving the quality of cardiac care: the EFFECT study: a randomized trial. , 2009, JAMA.

[23]  Jonathan J Deeks,et al.  The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. , 2005, Journal of clinical epidemiology.