Explained Residual Variation, Explained Risk, and Goodness of Fit

Abstract A loss function approach is used to define the concepts of explained residual variation and explained risk for general regression models. Explained risk measures the ability of the covariates in a correctly specified model to distinguish differing outcomes. Explained residual variation, which is R 2 for a linear model, estimates the explained risk with a penalty for poorly fitting models. Application of the general definitions to linear regression, logistic regression, and survival analysis is given. The importance of distinguishing the concepts of explained residual variation, explained risk, and goodness of fit is discussed.

[1]  B. Margolin,et al.  An Analysis of Variance for Categorical Data , 1971 .

[2]  B. Efron Regression and ANOVA with Zero-One Data: Measures of Residual Variation , 1978 .

[3]  The Sample Coefficient of Determination in Simple Linear Regression , 1981 .

[4]  A. Afifi,et al.  Goodness-of-Fit Statistics for General Linear Regression Equations in the Presence of Replicated Responses , 1987 .

[5]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[6]  G. Judge,et al.  The Theory and Practice of Econometrics , 1981 .

[7]  J. Kent Information gain and a general measure of correlation , 1983 .

[8]  J. Barrett The Coefficient of Determination—Some Limitations , 1974 .

[9]  S. Weisberg,et al.  Applied Linear Regression (2nd ed.). , 1986 .

[10]  M. J. R. Healy,et al.  The use of R2 as a measure of goodness of fit , 1984 .

[11]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[12]  R Simon,et al.  Measures of explained variation for survival data. , 1990, Statistics in medicine.

[13]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[14]  S. Haberman Analysis of Dispersion of Multinomial Responses , 1982 .

[15]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[16]  F. Harrell,et al.  Regression modelling strategies for improved prognostic prediction. , 1984, Statistics in medicine.

[17]  Inge S. Helland,et al.  On the Interpretation and Use of R 2 in Regression Analysis , 1987 .

[18]  Barry H. Margolin,et al.  An Analysis of Variance for Categorical Data, II: Small Sample Comparisons with Chi Square and other Competitors , 1974 .

[19]  T. O. Kvålseth Cautionary Note about R 2 , 1985 .

[20]  John T. Kent,et al.  Measures of dependence for censored survival data , 1988 .

[21]  Elisa T. Lee,et al.  A computer program for linear logistic regression analysis. , 1974, Computer programs in biomedicine.

[22]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .