Postmodeling Sensitivity Analysis to Detect the Effect of Missing Data Mechanisms

Incomplete or missing data is a common problem in almost all areas of empirical research. It is well known that simple and ad hoc methods such as complete case analysis or mean imputation can lead to biased and/or inefficient estimates. The method of maximum likelihood works well; however, when the missing data mechanism is not one of missing completely at random (MCAR) or missing at random (MAR), it too can result in incorrect inference. Statistical tests for MCAR have been proposed, but these are restricted to a certain class of problems. The idea of sensitivity analysis as a means to detect the missing data mechanism has been proposed in the statistics literature in conjunction with selection models where conjointly the data and missing data mechanism are modeled. Our approach is different here in that we do not model the missing data mechanism but use the data at hand to examine the sensitivity of a given model to the missing data mechanism. Our methodology is meant to raise a flag for researchers when the assumptions of MCAR (or MAR) do not hold. To our knowledge, no specific proposal for sensitivity analysis has been set forth in the area of structural equation models (SEM). This article gives a specific method for performing postmodeling sensitivity analysis using a statistical test and graphs. A simulation study is performed to assess the methodology in the context of structural equation models. This study shows success of the method, especially when the sample size is 300 or more and the percentage of missing data is 20% or more. The method is also used to study a set of real data measuring physical and social self-concepts in 463 Nigerian adolescents using a factor analysis model.

[1]  Peter M. Bentler,et al.  Tests of homogeneity of means and covariance matrices for multivariate incomplete data , 2002 .

[2]  Jinfang Wang,et al.  Testing the Equality of Multivariate Distributions Using the Bootstrap and Integrated Empirical Processes , 2006 .

[3]  S Natasha Beretvas,et al.  Meta-analytic methods of pooling correlation matrices for structural equation modeling under different patterns of missing data. , 2005, Psychological methods.

[4]  Geert Molenberghs,et al.  The nature of sensitivity in monotone missing not at random models , 2006, Comput. Stat. Data Anal..

[5]  Craig K. Enders,et al.  The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models , 2001 .

[6]  K. Yuan,et al.  A unified approach to exploratory factor analysis with missing data, nonnormal data, and in the presence of outliers , 2002 .

[7]  Xin-Yuan Song,et al.  Model comparison of generalized linear mixed models , 2006, Statistics in medicine.

[8]  P. Bentler,et al.  ML Estimation of Mean and Covariance Structures with Missing Data Using Complete Data Routines , 1999 .

[9]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[10]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[11]  M. Kenward Selection models for repeated measurements with non-random dropout: an illustration of sensitivity. , 1998, Statistics in medicine.

[12]  J. Graham Adding Missing-Data-Relevant Variables to FIML-Based Structural Equation Models , 2003 .

[13]  R. Little A Test of Missing Completely at Random for Multivariate Data with Missing Values , 1988 .

[14]  Craig K. Enders,et al.  Applying the Bollen-Stine Bootstrap for Goodness-of-Fit Measures to Structural Equation Models with Missing Data , 2002, Multivariate behavioral research.

[15]  Sik-Yum Lee,et al.  Maximum Likelihood Estimation and Model Comparison for Mixtures of Structural Equation Models with Ignorable Missing Data , 2003, J. Classif..

[16]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[17]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[18]  S G Baker,et al.  Marginal regression for repeated binary data with outcome subject to non-ignorable non-response. , 1995, Biometrics.

[19]  Geert Molenberghs,et al.  Strategies to fit pattern-mixture models. , 2002, Biostatistics.

[20]  Craig K. Enders,et al.  Using an EM Covariance Matrix to Estimate Structural Equation Models With Missing Data: Choosing an Adjusted Sample Size to Improve the Accuracy of Inferences , 2004 .

[21]  R Hardy,et al.  Methods for handling missing data , 2009 .

[22]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[23]  Mortaza Jamshidian,et al.  Strategies for Analysis of Incomplete Data , 2004 .

[24]  K. Yuan,et al.  5. Three Likelihood-Based Methods for Mean and Covariance Structure Analysis with Nonnormal Missing Data , 2000 .

[25]  Sik-Yum Lee Bayesian Analysis of Nonlinear Structural Equation Models with Nonignorable Missing Data , 2006 .

[26]  P. Allison Missing data techniques for structural equation modeling. , 2003, Journal of abnormal psychology.

[27]  S. Pocock,et al.  Coping with missing data in clinical trials: A model‐based approach applied to asthma trials , 2002, Statistics in medicine.

[28]  Craig K. Enders,et al.  The impact of nonnormality on full information maximum-likelihood estimation for structural equation models with missing data. , 2001, Psychological methods.

[29]  Mortaza Jamshidian,et al.  Advances in Analysis of Mean and Covariance Structure when Data are Incomplete , 2007 .

[30]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[31]  M. Kenward,et al.  Informative Drop‐Out in Longitudinal Data Analysis , 1994 .

[32]  D Scharfstein,et al.  Methods for Conducting Sensitivity Analysis of Trials with Potentially Nonignorable Competing Causes of Censoring , 2001, Biometrics.

[33]  James M. Robins,et al.  Semiparametric Regression for Repeated Outcomes With Nonignorable Nonresponse , 1998 .

[34]  Nian-Sheng Tang,et al.  Bayesian analysis of structural equation models with mixed exponential family and ordered categorical data. , 2006, The British journal of mathematical and statistical psychology.

[35]  Mortaza Jamshidian,et al.  Testing equality of covariance matrices when data are incomplete , 2007, Comput. Stat. Data Anal..