Six: Dealing with Missing or Incomplete Data: Debunking the Myth of Emptiness

I n almost any research you perform, there is the potential for missing or incomplete data. Missing data can occur for many reasons: participants can fail to respond to questions (legitimately or illegitimately—more on that later), equipment and data collecting or recording mechanisms can malfunction , subjects can withdraw from studies before they are completed, and data entry errors can occur. In later chapters I also discuss the elimination of extreme scores and outliers, which also can lead to missingness. The issue with missingness is that nearly all classic and modern statistical techniques assume (or require) complete data, and most common statistical packages default to the least desirable options for dealing with missing data: deletion of the case from the analysis. Most people analyzing quantitative data allow the software to default to eliminating important data from their analyses, despite that individual or case potentially having a good deal of other data to contribute to the overall analysis. It is my argument in this chapter that all researchers should examine their data for missingness, and researchers wanting the best (i.e., the most replicable and generalizable) results from their research need to be prepared to deal with missing data in the most appropriate and desirable way possible. In this chapter I briefly review common reasons for missing (or incomplete) data, compare and contrast several common methods for dealing with missingness, and demonstrate some of the benefits of using more modern methods (and some drawbacks of using the traditional, default methods) in the search for the best, most scientific outcomes for your research.

[1]  Jason W. Osborne,et al.  Creating Valid Prediction Equations in Multiple Regression Shrinkage, Double Cross–Validation, and Confidence Intervals Around Predictions , 2008 .

[2]  A. Acock Working With Missing Values , 2005 .

[3]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[4]  Y. Haitovsky Missing Data in Regression Analysis , 1968 .

[5]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[6]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[7]  Elizabeth A Stuart,et al.  Multiple imputation with large data sets: a case study of the Children's Mental Health Initiative. , 2009, American journal of epidemiology.

[8]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[9]  T. Charlton The Olmecs: America's First Civilization , 2006 .

[10]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[11]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[12]  Sven Rabung,et al.  [How to deal with missing data?]. , 2010, Psychotherapie, Psychosomatik, medizinische Psychologie.

[13]  Daniel J. Pratt,et al.  Education Longitudinal Study of 2002: Base Year Data File User's Manual. NCES 2004-405. , 2004 .

[14]  J. Osborne Prediction in Multiple Regression , 2000 .

[15]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[16]  Yang C. Yuan,et al.  Multiple Imputation for Missing Data: Concepts and New Development , 2000 .