Estimation of parameters and missing values under a regression model with non-normally distributed and non-randomly incomplete data.

We carried out a simulation study to compare the performance of three algorithms (complete cases, ALLVALUE, and expectation maximization, EM) in estimating regression parameters and missing values for situations that have varying amounts of missing data, distributions (normal, mixture of normals and lognormal), patterns of incomplete data (random, related and censored), and degrees of correlational structure among the dependent and independent variables. We found that the EM and complete cases algorithms performed equally well regardless of the correlational structure, when the percentage of incomplete data was only 5 per cent. When this percentage increased to 25 per cent, the EM algorithm was generally best for estimation, but the complete cases algorithm was safe and conservative. This finding may be attributed to the study design, which required that the slopes be the same in the population of all cases, and in the population of complete cases. In addition, the one-step imputing method (ALLVALUE) was competitive only for situations with weak correlational structure and/or little missing data. In that situation the bias caused with use of all available information was less than that caused with use of only complete cases. On the other hand, for imputation, the EM algorithm performed optimally, even in situations of censored or log-normally distributed data.