This paper reviews the literature on methods for dealing with missing data, discusses four commonly used methods, and illustrates these approaches with a small hypothetical data set. Most studies contain some missing data, and the reasons data are missing are many and varied. Four commonly used methods have been identified in the literature: (1) listwise deletion; (2) pairwise deletion; (3) mean imputation; and (4) regression imputation. Listwise deletion, which is the default in some statistical packages (e.g., the Statistical Package for the Social Sciences and the Statistical Analysis System), is the most commonly used method, also by default. However, because listwise deletion eliminates all cases for a participant missing data on any predictor or criterion variable, it is not the most effective method. Pairwise deletion uses those observations that have no missing values to compute the correlations. Thus, it preserves information that would have been lost when using listwise deletion. However, since different sample sizes go into the computing of the correlations, the resulting correlation matrix may not be positive definite (a mathematical condition required to invert the correlation matrix). In mean imputation, the mean for a particular variable, computed from available cases, is substituted in place of missing data values on the remaining cases. This allows the researcher to use the rest of the participant's data. When using a regression-based procedure to estimate the missing values, the estimation takes into account the relationships among the variables. Thus, substitution by regression is more statistically efficient. (Contains 1 figure, 7 tables, and 15 references.) (Author/SLD) Reproductions supplied by EDRS are the best that can be made from the original document. Running head: MISSING DATA A Review of the Literature on Missing Data Jesus Tanguma University of Houston Clear Lake U.S. DEPARTMENT OF EDUCATION Office of Educational Research and Improvement ED ATIONAL RESOURCES INFORMATION CENTER (ERIC) This document has been reproduced as received from the person or organization originating it. Minor changes have been made to improve reproduction quality. Points of view or opinions stated in this document do not necessarily represent official OERI position or policy. PERMISSION TO REPRODUCE AND DISSEMINATE THIS MATERIAL HAS BEEN GRANTED BY T. Tar) urns TO THE EDUCATIONAL RESOURCES INFORMATION CENTER (ERIC) 1 Missing data Paper presented at the annual meeting of the Mid-South Educational Research Association, Bowling Green, KY, November 16, 2000. 2 BEST COPY AVAILABLE
[1]
R. Little.
Missing-Data Adjustments in Large Surveys
,
1988
.
[2]
P. Roth.
MISSING DATA: A CONCEPTUAL REVIEW FOR APPLIED PSYCHOLOGISTS
,
1994
.
[3]
Roderick J. A. Little,et al.
The Analysis of Social Science Data with Missing Values
,
1989
.
[4]
Alexander Basilevsky,et al.
Chapter 12 – Missing Data: A Review of the Literature
,
1983
.
[5]
Leland Wilkinson,et al.
Statistical Methods in Psychology Journals Guidelines and Explanations
,
2005
.
[6]
Naresh K. Malhotra,et al.
Analyzing Marketing Research Data with Incomplete Information on the Dependent Variable
,
1987
.
[7]
Allan Donner,et al.
The Relative Effectiveness of Procedures Commonly Used in Multiple Regression Analysis for Dealing with Missing Values
,
1982
.
[8]
Jacob Cohen,et al.
Applied multiple regression/correlation analysis for the behavioral sciences
,
1979
.
[9]
Robert P. Leone,et al.
A two-stage imputation procedure for item nonresponse in surveys
,
1991
.
[10]
John G. Orme,et al.
Multiple Regression with Missing Data
,
1991
.
[11]
Stephen A. Stumpf.
A Note on Handling Missing Data
,
1978
.
[12]
M. Raymond.
Missing Data in Evaluation Research
,
1986
.
[13]
D. Rubin,et al.
Statistical Analysis with Missing Data
,
1988
.
[14]
Jae-On Kim,et al.
The Treatment of Missing Data in Multivariate Analysis
,
1977
.