SEM with Missing Data and Unknown Population Distributions Using Two-Stage ML: Theory and Its Application

This article provides the theory and application of the 2-stage maximum likelihood (ML) procedure for structural equation modeling (SEM) with missing data. The validity of this procedure does not require the assumption of a normally distributed population. When the population is normally distributed and all missing data are missing at random (MAR), the direct ML procedure is nearly optimal for SEM with missing data. When missing data mechanisms are unknown, including auxiliary variables in the analysis will make the missing data mechanism more likely to be MAR. It is much easier to include auxiliary variables in the 2-stage ML than in the direct ML. Based on most recent developments for missing data with an unknown population distribution, the article first provides the least technical material on why the normal distribution-based ML generates consistent parameter estimates when the missing data mechanism is MAR. The article also provides sufficient conditions for the 2-stage ML to be a valid statistical procedure in the general case. For the application of the 2-stage ML, an SAS IML program is given to perform the first-stage analysis and EQS codes are provided to perform the second-stage analysis. An example with open- and closed-book examination data is used to illustrate the application of the provided programs. One aim is for quantitative graduate students/applied psychometricians to understand the technical details for missing data analysis. Another aim is for applied researchers to use the method properly.

[1]  T. Micceri The unicorn, the normal curve, and other improbable creatures. , 1989 .

[2]  Yutaka Tanaka,et al.  Influence in covariance structure analysis : with an application to confirmatory factor analysis , 1991 .

[3]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[4]  M. Browne Asymptotically distribution-free methods for the analysis of covariance structures. , 1984, The British journal of mathematical and statistical psychology.

[5]  Michael E. Sobel,et al.  Pseudo-Maximum Likelihood Estimation of Mean and Covariance Structures with Missing Data , 1990 .

[6]  P. Bentler,et al.  A Two-Stage Approach to Missing Data: Theory and Application to Auxiliary Variables , 2009 .

[7]  Won-Chan Lee,et al.  Bootstrapping correlation coefficients using univariate and bivariate sampling. , 1998 .

[8]  K. Yuan,et al.  A unified approach to exploratory factor analysis with missing data, nonnormal data, and in the presence of outliers , 2002 .

[9]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[10]  T. W. Anderson Maximum Likelihood Estimates for a Multivariate Normal Distribution when Some Observations are Missing , 1957 .

[11]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[12]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[13]  N M Laird,et al.  Missing data in longitudinal studies. , 1988, Statistics in medicine.

[14]  R. Little Robust Estimation of the Mean and Covariance Matrix from Data with Missing Values , 1988 .

[15]  C. Gouriéroux,et al.  PSEUDO MAXIMUM LIKELIHOOD METHODS: THEORY , 1984 .

[16]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[17]  P. Bentler,et al.  ML Estimation of Mean and Covariance Structures with Missing Data Using Complete Data Routines , 1999 .

[18]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[19]  Craig K. Enders,et al.  Applying the Bollen-Stine Bootstrap for Goodness-of-Fit Measures to Structural Equation Models with Missing Data , 2002, Multivariate behavioral research.

[20]  T. Ferguson A Course in Large Sample Theory , 1996 .

[21]  Takeshi Amemiya,et al.  Regression Analysis when the Dependent Variable is Truncated Normal , 1973 .

[22]  J. Tobin Estimation of Relationships for Limited Dependent Variables , 1958 .

[23]  Ke-Hai Yuan,et al.  Mean and Covariance Structure Analysis: Theoretical and Practical Improvements , 1997 .

[24]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[25]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[26]  Craig K. Enders,et al.  An SAS Macro for Implementing the Modified Bollen-Stine Bootstrap for Missing Data: Implementing the Bootstrap Using Existing Structural Equation Modeling Software , 2005 .

[27]  Paul J. Hoffman,et al.  Generating variables with arbitrary properties , 1959 .

[28]  K. Yuan,et al.  Structural Equation Modeling with Small Samples: Test Statistics. , 1999, Multivariate behavioral research.

[29]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[30]  Peter M. Bentler,et al.  EQS : structural equations program manual , 1989 .

[31]  P M Bentler,et al.  Normal theory based test statistics in structural equation modelling. , 1998, The British journal of mathematical and statistical psychology.

[32]  J. Graham Adding Missing-Data-Relevant Variables to FIML-Based Structural Equation Models , 2003 .

[33]  K. Yuan,et al.  A Unified Approach to Multi-group Structural Equation Modeling with Nonstandard Samples , 2000 .

[34]  K. Mardia Measures of multivariate skewness and kurtosis with applications , 1970 .

[35]  A. Satorra,et al.  Corrections to test statistics and standard errors in covariance structure analysis. , 1994 .

[36]  Y Kano,et al.  Can test statistics in covariance structure analysis be trusted? , 1992, Psychological bulletin.

[37]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[38]  A. Rotnitzky,et al.  A note on the bias of estimators with missing data. , 1994, Biometrics.

[39]  J. Heckman Sample selection bias as a specification error , 1979 .

[40]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[41]  Ke-Hai Yuan,et al.  Normal distribution based pseudo ML for missing data: With applications to mean and covariance structure analysis , 2009, J. Multivar. Anal..

[42]  Craig K. Enders,et al.  Using an EM Covariance Matrix to Estimate Structural Equation Models With Missing Data: Choosing an Adjusted Sample Size to Improve the Accuracy of Inferences , 2004 .

[43]  K. Yuan,et al.  5. Three Likelihood-Based Methods for Mean and Covariance Structure Analysis with Nonnormal Missing Data , 2000 .

[44]  Ke-Hai Yuan,et al.  A Unified Approach to Multi-group Structural Equation Modeling with Nonstandard Samples , 2000 .

[45]  Kenneth A. Bollen,et al.  Structural Equations with Latent Variables , 1989 .