Missing Data in Regression Analysis

SUMMARY Two alternative methods for dealing with the problem of missing observations in regression analysis are investigated. One is to discard all incomplete observations and to apply the ordinary least-squares technique only to the complete observations. The alternative is to compute the covariances between all pairs of variables, each time using only the observations having values of both variables, and to apply these covariances in constructing the system of normal equations. The former is shown to be equivalent to the Fisher-Yates method of assigning "neutral" values to missing entries in experimental design. The investigation is carried out by means of simulation. Eight sets of regression data were generated, differing from each other with respect to important factors. Various deletion patterns are applied to these regression data. The estimates resulting from applying the two alternative methods to the data with missing entries are compared with the known regression equations. In almost all the cases which were investigated the former method (ordinary least squares applied only to the complete observations) is judged superior. However, when the proportion of incomplete observations is high or when the pattern of the missing entries is highly non-random, it seems plausible that one of the many methods of assigning values to the missing entries should be applied.

[1]  R. R. Hocking,et al.  ESTIMATION OF PARAMETERS WITH INCOMPLETE DATA. , 1969 .

[2]  R. Elashoff,et al.  Missing Observations in Multivariate Statistics I. Review of the Literature , 1966 .

[3]  Robert Summers,et al.  A Capital-Intensive Approach to the Small Sample Properties of Various Simultaneous Equation Estimators , 1965 .

[4]  M. Glasser,et al.  Linear Regression Analysis with Missing Observations among the Independent Variables , 1964 .

[5]  R. Bargmann,et al.  MAXIMUM LIKELIHOOD ESTIMATION WITH INCOMPLETE MULTIVARIATE DATA , 1964 .

[6]  Richard F. Kosobud,et al.  A Note on a Problem Caused by Assignment of Missing Data in Sample Surveys , 1963 .

[7]  S. F. Buck A Method of Estimation of Missing Values in Multivariate Data Suitable for Use with an Electronic Computer , 1960 .

[8]  D. Suits Use of Dummy Variables in Regression Equations , 1957 .

[9]  George E. Nicholson,et al.  Estimation of Parameters from Incomplete Multivariate Samples , 1957 .

[10]  T. W. Anderson Maximum Likelihood Estimates for a Multivariate Normal Distribution when Some Observations are Missing , 1957 .

[11]  C. Radhakrishna Rao,et al.  Analysis of Dispersion with Incomplete Observations on One of the Characters , 1956 .

[12]  George L. Edgett Multiple Regression with Missing Observations Among the Independent Variables , 1956 .

[13]  Frederic M. Lord,et al.  Estimation of Parameters from Incomplete Data , 1954 .

[14]  David Lindley,et al.  Advanced Statistical Methods in Biometric Research. , 1953 .

[15]  S. S. Wilks Moments and Distributions of Estimates of Population Parameters from Fragmentary Samples , 1932 .