Coefficients of Determination for Multiple Logistic Regression Analysis

Abstract Coefficients of determination for continuous predicted values (R 2 analogs) in logistic regression are examined for their conceptual and mathematical similarity to the familiar R 2 statistic from ordinary least squares regression, and compared to coefficients of determination for discrete predicted values (indexes of predictive efficiency). An example motivated by substantive concerns and using empirical data from a national household probability sample is presented to illustrate the behavior of the different coefficients of determination in the evaluation of models including dependent variables with different base rates—that is, different proportions of cases or observations with “positive” outcomes. One R 2 analog appears to be preferable to the others both in terms of conceptual similarity to the ordinary least squares coefficient of determination, and in terms of its relative independence from the base rate. In addition, base rate should also be considered when selecting an index of predictive efficiency. As expected, the conclusions based on R 2 analogs are not necessarily consistent with conclusions based on predictive efficiency, with respect to which of several outcomes is better predicted by a given model.

[1]  O. D. Duncan,et al.  The Efficiency of Prediction in Criminology , 1949, American Journal of Sociology.

[2]  David R. Cox The analysis of binary data , 1970 .

[3]  D. McFadden MEASUREMENT OF URBAN TRAVEL DEMAND , 1974 .

[4]  B. Efron Regression and ANOVA with Zero-One Data: Measures of Residual Variation , 1978 .

[5]  A. Agresti,et al.  Statistical Methods for the Social Sciences , 1979 .

[6]  Peter Burke,et al.  Log-linear models , 1980 .

[7]  G. Maddala Limited-dependent and qualitative variables in econometrics: Introduction , 1983 .

[8]  P. Schmidt,et al.  Limited-Dependent and Qualitative Variables in Econometrics. , 1984 .

[9]  D. Pregibon,et al.  Graphical Methods for Assessing Logistic Regression Models , 1984 .

[10]  John H. Aldrich,et al.  Linear probability, logit and probit models , 1984 .

[11]  T. O. Kvålseth Cautionary Note about R 2 , 1985 .

[12]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[13]  L. Magee,et al.  R 2 Measures Based on Wald and Likelihood Ratio Joint Significance Tests , 1990 .

[14]  A. Agresti An introduction to categorical data analysis , 1997 .

[15]  Randall G. Shelden,et al.  Girls, delinquency, and juvenile justice , 1991 .

[16]  N. Nagelkerke,et al.  A note on a general definition of the coefficient of determination , 1991 .

[17]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[18]  Scott Menard,et al.  Multiple Problem Youth: Delinquency, Substance Use, and Mental Health Problems , 1991 .

[19]  Alfred DeMaris,et al.  Logit Modeling: Practical Applications , 1992 .

[20]  Timothy M. Hagle,et al.  Goodness-of-Fit Measures for Probit and Logit , 1992 .

[21]  N. Wermuth,et al.  A Comment on the Coefficient of Determination for Binary Responses , 1992 .

[22]  M. J. Norušis,et al.  SPSS professional statistics 6.1 , 1994 .

[23]  P. Allison Survival analysis using the SAS system : a practical guide , 1995 .

[24]  Eric R. Ziegel,et al.  Logistic Regression Examples Using the SAS System , 1996 .

[25]  Hans C. Jessen,et al.  Applied Logistic Regression Analysis , 1996 .

[26]  Ngaire Naffine,et al.  Feminism and criminology , 1996 .

[27]  Thomas P. Ryan,et al.  Modern Regression Methods , 1996 .

[28]  M. J. Norušis,et al.  SPSS professional statistics 7.5 , 1997 .

[29]  D. Leitner,et al.  The Effects of Base Rate, Selection Ratio, Sample Size, and Reliability of Predictors on Predictive Efficiency Indices Associated with Logistic Regression Models. , 1997 .

[30]  S. Menard Applied Logistic Regression Analysis , 1996 .