Imputing Missing Data: A Comparison of Methods for Social Work Researchers

Choosing the most appropriate method to handle missing data during analyses is one of the most challenging decisions confronting researchers. Often, missing values are just ignored rather than replaced with a reliable imputation method. Six methods of data imputation were used to replace missing data from two data sets of varying sizes; this article examines the results. Each imputation method is defined, and the pros and cons of its use in social science research are identified. The authors discuss comparisons of descriptive measures and multivariate analyses with the imputed variables and the results of a timed study to determine how long it took to use each imputation method on first and subsequent use. Implications for social work research are suggested. KEY WORDS: data analysis; data imputation methods; missing data; research methods ********** "Five hundred high school students completed the longitudinal study ... The analysis suggests that a significant difference was found between ..." These hypothetical results may appear to be positive, but the researcher failed to report that originally 850 students were in the study, and that each year 5% to 6% of the sample could not be found because they had moved, no longer had a phone, or chose not to participate. Furthermore, because of incomplete data for some variables, researchers had to drop other cases from the analysis. So in reality, more than 50% of the original sample might not be included, or accounted for, in this statement. It is possible that the participants not included in the final analysis have different characteristics from those who were included. How does this dearth of data affect the outcomes reported? Unfortunately, this scenario is all too common in the social work research reported in the literature. This article summarizes the hazards of ignoring missing data and identifies six data imputation methods that can resolve this problem. To examine how results might differ based on the imputation procedure selected, each of these methods was used on two different data sets, each with missing values. The results effectively demonstrate the importance of dealing with missing data and the many issues confronting the social work researcher in this regard. The researcher's goal is to conduct the most accurate analysis of the data to make valid and efficient inferences about a population to guide practitioners and researchers alike (Schafer & Graham, 2002). Accomplishing this goal requires choosing the most appropriate method to handle missing data. Too often, social work researchers ignore missing data and their effects on data analysis, thus limiting the researcher's ability to achieve this goal. Ignoring missing data typically occurs when there is a widespread failure to understand the significance of the problem or a lack of awareness of the solutions to the problem of missing data (Figueredo, McKnight, McKnight, & Sidani, 2000). The handling of missing data is not typically addressed in research reports; literature reviews prove this point. Of approximately 100 articles reviewed between 2001 and 2003 from three social work research journals (Journal of Social Service Research, Social Work, and Social Work Research), only 15 percent reported any information about the amount of missing data or how missing data were handled in the analysis. Because virtually all social science survey research involves some incomplete data, treatment of missing data should be a universal concern and addressed in all research reports. Numerous methods exist to handle the problem of missing data. They include both "old" methods requiring just a few mathematical computations and "new" methods requiring more complex computations that are increasingly easier for social work researchers to perform with statistical programming software. Here we examine the traditional methods, including listwise deletion (the least sophisticated method), mean substitution, hotdecking, and regression imputation. …

[1]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[2]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[3]  E. Rubin,et al.  Comorbid medical conditions among depressed elderly patients discharged home after acute psychiatric care. , 2003, The American journal of geriatric psychiatry : official journal of the American Association for Geriatric Psychiatry.

[4]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[5]  David L Streiner,et al.  The Case of the Missing Data: Methods of Dealing with Dropouts and other Research Vagaries , 2002, Canadian journal of psychiatry. Revue canadienne de psychiatrie.

[6]  Therese D. Pigott,et al.  A Review of Methods for Missing Data , 2001 .

[7]  John W. Graham,et al.  Planned missing-data designs in analysis of change. , 2001 .

[8]  Patrick E. McKnight,et al.  Multivariate modeling of missing data within and across assessment waves. , 2000, Addiction.

[9]  Haiyang Li,et al.  Service needs of depressed older adults following acute psychiatric care , 2000 .

[10]  Paul W. Thurston,et al.  A Monte Carlo Study of Missing Item Methods , 2000 .

[11]  Sharon D Johnson,et al.  Measuring Neighborhood and School Environments Perceptual and Aggregate Approaches , 2000 .

[12]  John W. Graham,et al.  Multiple imputation in multivariate research. , 2000 .

[13]  Jürgen Baumert,et al.  Modeling longitudinal and multilevel data: Practical issues, applied approaches, and specific examples. , 2000 .

[14]  Q. Raaijmakers,et al.  Effectiveness of Different Missing Data Treatments in Surveys with Likert-Type Data: Introducing the Relative Mean Substitution Approach , 1999 .

[15]  L. Kurlowicz,et al.  The Mini Mental State Examination (MMSE). , 1999, Director.

[16]  S. Johnson,et al.  Impact of environment on adolescent mental health and behavior: structural equation modeling. , 1999, The American journal of orthopsychiatry.

[17]  John G. Orme,et al.  Multiple Regression with Missing Data , 1991 .

[18]  Donald B. Rubin,et al.  EM and beyond , 1991 .

[19]  M. Folstein,et al.  Mini-Mental State Examination (MMSE) , 2019, Encyclopedia of Gerontology and Population Aging.

[20]  J. Yesavage,et al.  Geriatric Depression Scale (GDS): Recent evidence and development of a shorter version. , 1986 .

[21]  M. Aldenderfer,et al.  Cluster Analysis. Sage University Paper Series On Quantitative Applications in the Social Sciences 07-044 , 1984 .

[22]  B. Tabachnick,et al.  Using Multivariate Statistics , 1983 .