A Two-Stage Approach to Missing Data: Theory and Application to Auxiliary Variables

A well-known ad-hoc approach to conducting structural equation modeling with missing data is to obtain a saturated maximum likelihood (ML) estimate of the population covariance matrix and then to use this estimate in the complete data ML fitting function to obtain parameter estimates. This 2-stage (TS) approach is appealing because it minimizes a familiar function while being only marginally less efficient than the full information ML (FIML) approach. Additional advantages of the TS approach include that it allows for easy incorporation of auxiliary variables and that it is more stable in smaller samples. The main disadvantage is that the standard errors and test statistics provided by the complete data routine will not be correct. Empirical approaches to finding the right corrections for the TS approach have failed to provide unequivocal solutions. In this article, correct standard errors and test statistics for the TS approach with missing completely at random and missing at random normally distributed data are developed and studied. The new TS approach performs well in all conditions, is only marginally less efficient than the FIML approach (and is sometimes more efficient), and has good coverage. Additionally, the residual-based TS statistic outperforms the FIML test statistic in smaller samples. The TS method is thus a viable alternative to FIML, especially in small samples, and its further study is encouraged.

[1]  O. P. V. Driel,et al.  On various causes of improper solutions in maximum likelihood factor analysis , 1978 .

[2]  C. Brown,et al.  Asymptotic comparison of missing data procedures for estimating factor loadings , 1983 .

[3]  James C. Anderson,et al.  The effect of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis , 1984 .

[4]  M. Browne Asymptotically distribution-free methods for the analysis of covariance structures. , 1984, The British journal of mathematical and statistical psychology.

[5]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[6]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised Edition) , 1999 .

[7]  Kenneth A. Bollen,et al.  Structural Equations with Latent Variables , 1989 .

[8]  Peter M. Bentler,et al.  EQS : structural equations program manual , 1989 .

[9]  Karl G. Jöreskog,et al.  Lisrel 8: User's Reference Guide , 1997 .

[10]  Peter C. M. Molenaar,et al.  A comparison of four methods of calculating standard errors of maximum likelihood estimates in the analysis of covariance structure. , 1991 .

[11]  Y Kano,et al.  Can test statistics in covariance structure analysis be trusted? , 1992, Psychological bulletin.

[12]  A. Satorra,et al.  Corrections to test statistics and standard errors in covariance structure analysis. , 1994 .

[13]  M. Rovine,et al.  Latent variables models and missing data analysis. , 1994 .

[14]  T. Ferguson A Course in Large Sample Theory , 1996 .

[15]  S. West,et al.  The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. , 1996 .

[16]  James L. Arbuckle,et al.  Full Information Estimation in the Presence of Incomplete Data , 1996 .

[17]  John W. Graham,et al.  Analysis With Missing Data in Prevention Research , 1997 .

[18]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[19]  P M Bentler,et al.  Normal theory based test statistics in structural equation modelling. , 1998, The British journal of mathematical and statistical psychology.

[20]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[21]  Geert Molenberghs,et al.  Likelihood Based Frequentist Inference When Data Are Missing at Random , 1998 .

[22]  P. Bentler,et al.  ML Estimation of Mean and Covariance Structures with Missing Data Using Complete Data Routines , 1999 .

[23]  J. Schafer,et al.  On the performance of multiple imputation for multivariate data with small sample size , 1999 .

[24]  G. A. Marcoulides,et al.  A First Course in Structural Equation Modeling , 2000 .

[25]  K. Yuan,et al.  5. Three Likelihood-Based Methods for Mean and Covariance Structure Analysis with Nonnormal Missing Data , 2000 .

[26]  Craig K. Enders,et al.  The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models , 2001 .

[27]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[28]  K. Bollen,et al.  Improper Solutions in Structural Equation Models , 2001 .

[29]  Peter M. Bentler,et al.  Tests of homogeneity of means and covariance matrices for multivariate incomplete data , 2002 .

[30]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[31]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[32]  J. Graham Adding Missing-Data-Relevant Variables to FIML-Based Structural Equation Models , 2003 .

[33]  Craig K. Enders,et al.  Using an EM Covariance Matrix to Estimate Structural Equation Models With Missing Data: Choosing an Adjusted Sample Size to Improve the Accuracy of Inferences , 2004 .

[34]  Craig K. Enders,et al.  The Impact of Missing Data on Sample Reliability Estimates: Implications for Reliability Reporting Practices , 2004 .

[35]  Victoria Savalei,et al.  A Statistically Justified Pairwise ML Method for Incomplete Nonnormal Data: A Comparison With Direct ML and Pairwise ADF , 2005 .

[36]  Joseph L Schafer,et al.  Multiple Imputation for incomplete multivariate data under a latent-class selection model , 2006 .

[37]  K. Yuan,et al.  Standard errors in covariance structure models: asymptotics versus bootstrap. , 2006, The British journal of mathematical and statistical psychology.

[38]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 2019, Wiley Series in Probability and Statistics.