Imputation techniques in regression analysis: looking closely at their implementation

A problem which frequently arises in regression analysis is the presence of missing values on the explanatory variables. Imputation is a time-honoured approach to tackling it, since graphical exploration of properties of a statistical model requires a complete data matrix. This article examines the performance of five imputation techniques in two frequently used implementation procedures. Specifically, imputed values based on both the response and explanatory variables (type I) are contrasted with those based on only the explanatory variables (type II). Monte Carlo results indicate that imputed values with type I procedure may give spurious impression of high precision especially as the proportion of missing data increases. But with type II, overestimation of residual mean square error may arise. Several matrices of correlation coefficients are used and an illustrative real data example is given.

[1]  Anthony C. Atkinson,et al.  Plots, transformations, and regression : an introduction to graphical methods of diagnostic regression analysis , 1987 .

[2]  R. Little Robust Estimation of the Mean and Covariance Matrix from Data with Missing Values , 1988 .

[3]  A. Bello,et al.  Choosing among imputation techniques for incomplete multivariate data: a simulation study , 1993 .

[4]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[5]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[6]  Neil C. Schwertman,et al.  Computation of the mean vector and dispersion matrix for incomplete multivariate data , 1980 .

[7]  R. Elashoff,et al.  Missing Observations in Multivariate Statistics I. Review of the Literature , 1966 .

[8]  R. Little,et al.  Missing values in multivariate statistical analysis , 1974 .

[9]  O. L. Davies,et al.  Statistical methods in research and production , 1958 .

[10]  Y. Haitovsky Missing Data in Regression Analysis , 1968 .

[11]  Jeffrey S. Simonoff Regression diagnostics to detect nonrandom missingness in linear regression , 1988 .

[12]  W J Krzanowski,et al.  Missing value imputation in multivariate data using the singular value decomposition of a matrix , 1988 .

[13]  O. J. Dunn,et al.  The Treatment of Missing Values in Discriminant Analysis—I. The Sampling Experiment , 1972 .

[14]  Jae-On Kim,et al.  The Treatment of Missing Data in Multivariate Analysis , 1977 .

[15]  J. Goodnight A Tutorial on the SWEEP Operator , 1979 .

[16]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .