Imputing cross-sectional missing data: comparison of common techniques.

OBJECTIVE Increasing awareness of how missing data affects the analysis of clinical and public health interventions has led to increasing numbers of missing data procedures. There is little advice regarding which procedures should be selected under different circumstances. This paper compares six popular procedures: listwise deletion, item mean substitution, person mean substitution at two levels, regression imputation and hot deck imputation. METHOD Using a complete dataset, each was examined under a variety of sample sizes and differing levels of missing data. The criteria were the true t-values for the entire sample. RESULTS The results suggest important differences. If missing data are from a scale where about half the items are present, hot deck imputation or person mean substitution are best. Because person mean substitution is computationally simpler, similar in its efficiency, advocated by other researchers and more likely to be an option on statistical software packages, it is the method of choice. If the missing data are from a scale where more than half the items are missing, or with single-item measures, then hot deck imputation is recommended. The findings also showed that listwise deletion and item mean substitution performed poorly. CONCLUSIONS Person mean and hot deck imputation are preferred. Since listwise deletion and item mean substitution performed poorly, yet are the most widely reported methods, the findings have broad implications.