A Review of the Literature on Missing Data.

This paper reviews the literature on methods for dealing with missing data, discusses four commonly used methods, and illustrates these approaches with a small hypothetical data set. Most studies contain some missing data, and the reasons data are missing are many and varied. Four commonly used methods have been identified in the literature: (1) listwise deletion; (2) pairwise deletion; (3) mean imputation; and (4) regression imputation. Listwise deletion, which is the default in some statistical packages (e.g., the Statistical Package for the Social Sciences and the Statistical Analysis System), is the most commonly used method, also by default. However, because listwise deletion eliminates all cases for a participant missing data on any predictor or criterion variable, it is not the most effective method. Pairwise deletion uses those observations that have no missing values to compute the correlations. Thus, it preserves information that would have been lost when using listwise deletion. However, since different sample sizes go into the computing of the correlations, the resulting correlation matrix may not be positive definite (a mathematical condition required to invert the correlation matrix). In mean imputation, the mean for a particular variable, computed from available cases, is substituted in place of missing data values on the remaining cases. This allows the researcher to use the rest of the participant's data. When using a regression-based procedure to estimate the missing values, the estimation takes into account the relationships among the variables. Thus, substitution by regression is more statistically efficient. (Contains 1 figure, 7 tables, and 15 references.) (Author/SLD) Reproductions supplied by EDRS are the best that can be made from the original document. Running head: MISSING DATA A Review of the Literature on Missing Data Jesus Tanguma University of Houston Clear Lake U.S. DEPARTMENT OF EDUCATION Office of Educational Research and Improvement ED ATIONAL RESOURCES INFORMATION CENTER (ERIC) This document has been reproduced as received from the person or organization originating it. Minor changes have been made to improve reproduction quality. Points of view or opinions stated in this document do not necessarily represent official OERI position or policy. PERMISSION TO REPRODUCE AND DISSEMINATE THIS MATERIAL HAS BEEN GRANTED BY T. Tar) urns TO THE EDUCATIONAL RESOURCES INFORMATION CENTER (ERIC) 1 Missing data Paper presented at the annual meeting of the Mid-South Educational Research Association, Bowling Green, KY, November 16, 2000. 2 BEST COPY AVAILABLE