Missing Data

Missing data (a) reside at three missing data levels of analysis (item-, construct-, and person-level), (b) arise from three missing data mechanisms (missing completely at random, missing at random, and missing not at random) that range from completely random to systematic missingness, (c) can engender two missing data problems (biased parameter estimates and inaccurate hypothesis tests/inaccurate standard errors/low power), and (d) mandate a choice from among several missing data treatments (listwise deletion, pairwise deletion, single imputation, maximum likelihood, and multiple imputation). Whereas all missing data treatments are imperfect and are rooted in particular statistical assumptions, some missing data treatments are worse than others, on average (i.e., they lead to more bias in parameter estimates and less accurate hypothesis tests). Social scientists still routinely choose the more biased and error-prone techniques (listwise and pairwise deletion), likely due to poor familiarity with and misconceptions about the less biased/less error-prone techniques (maximum likelihood and multiple imputation). The current user-friendly review provides five easy-to-understand practical guidelines, with the goal of reducing missing data bias and error in the reporting of research results. Syntax is provided for correlation, multiple regression, and structural equation modeling with missing data.

[1]  Daniel A. Newman Longitudinal Modeling with Randomly and Systematically Missing Data: A Simulation of Ad Hoc, Maximum Likelihood, and Multiple Imputation Techniques , 2003 .

[2]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[3]  Craig K. Enders,et al.  Using an EM Covariance Matrix to Estimate Structural Equation Models With Missing Data: Choosing an Adjusted Sample Size to Improve the Accuracy of Inferences , 2004 .

[4]  Francis J. Yammarino,et al.  UNDERSTANDING MAIL SURVEY RESPONSE BEHAVIOR , 2004 .

[5]  Dave Bartram,et al.  Choosing the best method for local validity estimation: relative accuracy of meta-analysis versus a local study versus Bayes-analysis. , 2007, The Journal of applied psychology.

[6]  P. Roth,et al.  Missing Data in Multiple Item Scales: A Monte Carlo Analysis of Missing Data Techniques , 1999 .

[7]  R. Rosenthal Science and Ethics in Conducting, Analyzing, and Reporting Psychological Research , 1994, Psychological science.

[8]  John L.P. Thompson,et al.  Missing data , 2004, Amyotrophic lateral sclerosis and other motor neuron disorders : official publication of the World Federation of Neurology, Research Group on Motor Neuron Diseases.

[9]  J. Brady,et al.  The Belmont Report. Ethical principles and guidelines for the protection of human subjects of research. , 2014, The Journal of the American College of Dentists.

[10]  William E. Knight,et al.  Profiling active and passive nonrespondents to an organizational survey. , 2003, The Journal of applied psychology.

[11]  Peter M. Bentler,et al.  A Comparison of Maximum-Likelihood and Asymptotically Distribution-Free Methods of Treating Incomplete Nonnormal Data , 2003 .

[12]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[13]  Cynthia S. Cycyota,et al.  What (Not) to Expect When Surveying Executives , 2006 .

[14]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[15]  Craig K. Enders,et al.  A Primer on Maximum Likelihood Algorithms Available for Use With Missing Data , 2001 .

[16]  R. Bennett,et al.  Development of a measure of workplace deviance. , 2000, The Journal of applied psychology.

[17]  R. Downey,et al.  Missing data in Likert ratings: A comparison of replacement methods. , 1998, The Journal of general psychology.

[18]  Terry L. Childers,et al.  Understanding mail survey response behavior: A meta-analysis. , 1991 .

[19]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[20]  Carl T. Finkbeiner Estimation for the multiple factor model when data are missing , 1979 .

[21]  John W Graham,et al.  Planned missing data designs in psychological research. , 2006, Psychological methods.

[22]  M T Brannick Implications of empirical Bayes meta-analysis for test validation. , 2001, The Journal of applied psychology.

[23]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[24]  Patrick E. McKnight Missing Data: A Gentle Introduction , 2007 .

[25]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[26]  Craig K Enders,et al.  Estimating interaction effects with incomplete predictor variables. , 2014, Psychological methods.

[27]  Christopher Winship,et al.  Models for Sample Selection Bias , 1992 .

[28]  J. Graham Adding Missing-Data-Relevant Variables to FIML-Based Structural Equation Models , 2003 .

[29]  Daniel A. Newman Missing data techniques and low response rates: The role of systematic nonresponse parameters , 2008 .

[30]  P. Bentler,et al.  A Two-Stage Approach to Missing Data: Theory and Application to Auxiliary Variables , 2009 .

[31]  K Sijtsma,et al.  Influence of Imputation and EM Methods on Factor Analysis when Item Nonresponse in Questionnaire Data is Nonignorable , 2000, Multivariate behavioral research.

[32]  J. Schafer,et al.  On the performance of multiple imputation for multivariate data with small sample size , 1999 .

[33]  D. Dillman Mail and telephone surveys : the total design method , 1979 .

[34]  R. Little Pattern-Mixture Models for Multivariate Incomplete Data , 1993 .

[35]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[36]  J. Heckman Sample selection bias as a specification error , 1979 .

[37]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[38]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[39]  Philip L. Roth,et al.  Response Rates in HRM/OB Survey Research: Norms and Correlates, 1990-1994 , 1998 .

[40]  Daniel A. Newman,et al.  How Do Missing Data Bias Estimates of Within-Group Agreement? Sensitivity of SD WG, CVWG, rWG(J), rWG(J) * , and ICC to Systematic Nonresponse , 2009 .

[41]  John E. Hunter,et al.  Statistical power in criterion-related validation studies. , 1976 .

[42]  F. Anseel,et al.  Response Rates in Organizational Science, 1995–2008: A Meta-analytic Review and Guidelines for Survey Researchers , 2010 .

[43]  Christopher D. Barr,et al.  “If you treat me right, I reciprocate”: examining the role of exchange in organizational survey response , 2006 .

[44]  Sabrina Eberhart,et al.  Applied Missing Data Analysis , 2016 .

[45]  Patrick A. Puhani,et al.  The Heckman Correction for Sample Selection and Its Critique - A Short Survey , 2000 .

[46]  Donald B. Rubin,et al.  Selection Modeling Versus Mixture Modeling with Nonignorable Nonresponse , 1986 .

[47]  S. van Buuren,et al.  Multiple Imputation of Multilevel Data , 2006 .

[48]  Herbert W. Marsh,et al.  Pairwise Deletion for Missing Data in Structural Equation Models: Nonpositive Definite Matrices, Parameter Estimates, Goodness of Fit, and Adjusted Sample Sizes. , 1998 .

[49]  Samuel Forest,et al.  Personnel Selection: Test and Measurement Techniques , 1952 .

[50]  Peter M. Bentler,et al.  Treatments of Missing Data: A Monte Carlo Comparison of RBHDI, Iterative Stochastic Regression Imputation, and Expectation-Maximization , 2000 .

[51]  J. Graham,et al.  Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[52]  Craig K. Enders,et al.  Missing Data in Educational Research: A Review of Reporting Practices and Suggestions for Improvement , 2004 .

[53]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[54]  Craig K. Enders,et al.  The impact of nonnormality on full information maximum-likelihood estimation for structural equation models with missing data. , 2001, Psychological methods.

[55]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[56]  Russell S. Kirby,et al.  More Statistical and Methodological Myths and Urban Legends , 2016 .

[57]  Deborah M. Switzer,et al.  Systematic Data Loss in HRM Settings: A Monte Carlo Analysis , 1998 .