The estimation of R 2 and adjusted R 2 in incomplete data sets using multiple imputation

The coefficient of determination, known also as the R 2, is a common measure in regression analysis. Many scientists use the R 2 and the adjusted R 2 on a regular basis. In most cases, the researchers treat the coefficient of determination as an index of ‘usefulness’ or ‘goodness of fit,’ and in some cases, they even treat it as a model selection tool. In cases in which the data is incomplete, most researchers and common statistical software will use complete case analysis in order to estimate the R 2, a procedure that might lead to biased results. In this paper, I introduce the use of multiple imputation for the estimation of R 2 and adjusted R 2 in incomplete data sets. I illustrate my methodology using a biomedical example.

[1]  R. Fisher 014: On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. , 1921 .

[2]  A. Barten,et al.  Note on unbiased estimation of the squared multiple correlation coefficient , 1962 .

[3]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[4]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[5]  Ofer Harel,et al.  Inferences on missing information under multiple imputation and two-stage multiple imputation , 2007 .

[6]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[7]  S. van Buuren,et al.  Flexible mutlivariate imputation by MICE , 1999 .

[8]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[9]  Ofer Harel,et al.  Missing Information as a Diagnostic Tool for Latent Class Analysis , 2021, Journal of Data Science.

[10]  J. A. Díaz-García,et al.  SENSITIVITY ANALYSIS IN LINEAR REGRESSION , 2022 .

[11]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[12]  Joseph L Schafer,et al.  Robustness of a multivariate normal approximation for imputation of incomplete binary data , 2007, Statistics in medicine.

[13]  佐竹 元一郎,et al.  Applied Linear Statistical Models--Regression,Analysis of Variance,and Experimental Designs 3rd ed./John Neter et al.(1990) , 1991 .

[14]  J. Neter,et al.  Applied linear statistical models : regression, analysis of variance, and experimental designs , 1974 .

[15]  S. Chatterjee Sensitivity analysis in linear regression , 1988 .

[16]  J. Cramer,et al.  Mean and variance of R2 in small and moderate samples , 1987 .

[17]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[18]  D. C. Howell Statistical Methods for Psychology , 1987 .

[19]  Katie S. Martin,et al.  Food insecurity and gender are risk factors for obesity. , 2007, Journal of nutrition education and behavior.

[20]  D. Altman,et al.  Missing data , 2007, BMJ : British Medical Journal.

[21]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[22]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[23]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.

[24]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[25]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[26]  J. Barrett The Coefficient of Determination—Some Limitations , 1974 .

[27]  S. J. Press,et al.  Posterior distribution for the multiple correlation coefficient with fixed regressors , 1978 .

[28]  D. Rubin,et al.  Multiple Imputation for Nonresponse in Surveys , 1989 .