Fitting Ordinal Factor Analysis Models With Missing Data: A Comparison Between Pairwise Deletion and Multiple Imputation

This study compares two missing data procedures in the context of ordinal factor analysis models: pairwise deletion (PD; the default setting in Mplus) and multiple imputation (MI). We examine which procedure demonstrates parameter estimates and model fit indices closer to those of complete data. The performance of PD and MI are compared under a wide range of conditions, including number of response categories, sample size, percent of missingness, and degree of model misfit. Results indicate that both PD and MI yield parameter estimates similar to those from analysis of complete data under conditions where the data are missing completely at random (MCAR). When the data are missing at random (MAR), PD parameter estimates are shown to be severely biased across parameter combinations in the study. When the percentage of missingness is less than 50%, MI yields parameter estimates that are similar to results from complete data. However, the fit indices (i.e., χ2, RMSEA, and WRMR) yield estimates that suggested a worse fit than results observed in complete data. We recommend that applied researchers use MI when fitting ordinal factor models with missing data. We further recommend interpreting model fit based on the TLI and CFI incremental fit indices.

[1]  Willem E. Saris,et al.  Testing Structural Equation Models or Detection of Misspecifications? , 2009 .

[2]  Bengt Muthén,et al.  On structural equation modeling with data that are not missing completely at random , 1987 .

[3]  J. H. Steiger Structural Model Evaluation and Modification: An Interval Estimation Approach. , 1990, Multivariate behavioral research.

[4]  L. Angeles Evaluating Cutoff Criteria of Model Fit Indices for Latent Variable Models with Binary and Continuous Outcomes , 2002 .

[5]  M. Browne,et al.  Alternative Ways of Assessing Model Fit , 1992 .

[6]  P. Allison Estimation of Linear Models with Incomplete Data , 1987 .

[7]  Karl G. Jöreskog,et al.  LISREL 7: A guide to the program and applications , 1988 .

[8]  Kristin L. Sainani,et al.  Dealing with missing data , 2002 .

[9]  P. Bentler,et al.  Cutoff criteria for fit indexes in covariance structure analysis : Conventional criteria versus new alternatives , 1999 .

[10]  John W Graham,et al.  Planned missing data designs in psychological research. , 2006, Psychological methods.

[11]  Fan Jia Methods for Handling Missing Non-Normal Data in Structural Equation Modeling , 2016 .

[12]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[13]  Li Cai,et al.  Comparing the Fit of Item Response Theory and Factor Analysis Models , 2011 .

[14]  A. Satorra,et al.  Corrections to test statistics and standard errors in covariance structure analysis. , 1994 .

[15]  V. Savalei,et al.  When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. , 2012, Psychological methods.

[16]  P. Allison Missing data techniques for structural equation modeling. , 2003, Journal of abnormal psychology.

[17]  Lijuan Wang,et al.  Methods for Mediation Analysis with Missing Data , 2012, Psychometrika.

[18]  Daniel A. Newman Longitudinal Modeling with Randomly and Systematically Missing Data: A Simulation of Ad Hoc, Maximum Likelihood, and Multiple Imputation Techniques , 2003 .

[19]  Craig K. Enders,et al.  The Performance of the Full Information Maximum Likelihood Estimator in Multiple Regression Models with Missing Data , 2001 .

[20]  Alberto Maydeu-Olivares,et al.  Estimation of IRT graded response models: limited versus full information methods. , 2009, Psychological methods.

[21]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[22]  C. Y. Peng,et al.  Advances in Missing Data Methods and Implications for Educational Research , 2006 .

[23]  Alan Olinsky,et al.  The comparative efficacy of imputation methods for missing data in structural equation modeling , 2003, Eur. J. Oper. Res..

[24]  Carl T. Finkbeiner Estimation for the multiple factor model when data are missing , 1979 .

[25]  B. Muthén,et al.  Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes , 1997 .

[26]  Fan Jia,et al.  A Comparison of Imputation Strategies for Ordinal Missing Data on Likert Scale Variables , 2015, Multivariate behavioral research.

[27]  Alberto Maydeu-Olivares,et al.  Factor Analysis with Ordinal Indicators: A Monte Carlo Study Comparing DWLS and ULS Estimation , 2009 .

[28]  Ross Larsen,et al.  Missing Data Imputation versus Full Information Maximum Likelihood with Second-Level Dependencies , 2011 .

[29]  F. Prevosti,et al.  The impact of missing data on real morphological phylogenies: influence of the number and distribution of missing entries , 2009, Cladistics : the international journal of the Willi Hennig Society.

[30]  Albert Maydeu-Olivares,et al.  The Relationship Between the Standardized Root Mean Square Residual and Model Misspecification in Factor Analysis Models , 2018, Multivariate behavioral research.

[31]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[32]  Christine DiStefano,et al.  A Comparison of Diagonal Weighted Least Squares Robust Estimation Techniques for Ordinal Data , 2014 .

[33]  Yu Zhao The performance of model fit measures by robust weighted least squares estimators in Confirmatory Factor Analysis , 2015 .

[34]  Y. Rosseel,et al.  Assessing Fit in Structural Equation Models: A Monte-Carlo Evaluation of RMSEA Versus SRMR Confidence Intervals and Tests of Close Fit , 2018 .

[35]  Zhehan Jiang,et al.  Examining Chi-Square Test Statistics Under Conditions of Large Model Size and Ordinal Data , 2018 .

[36]  George E. P. Box,et al.  Some Problems of Statistics and Everyday Life , 1979 .

[37]  K. Petrides Introduction to Psychometric Theory , 2011 .

[38]  Robert C MacCallum,et al.  2001 Presidential Address: Working with Imperfect Models , 2003, Multivariate behavioral research.

[39]  Mark D. Reckase,et al.  Item Response Theory: Parameter Estimation Techniques , 1998 .

[40]  Daniel McNeish,et al.  The Thorny Relation Between Measurement Quality and Fit Index Cutoffs in Latent Variable Models , 2018, Journal of personality assessment.

[41]  Albert Maydeu-Olivares,et al.  Limited- and Full-Information Estimation and Goodness-of-Fit Testing in 2n Contingency Tables , 2005 .

[42]  Akihito Kamata,et al.  A Note on the Relation Between Factor Analytic and Item Response Theory Models , 2008 .

[43]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[44]  Chi-Square Statistics with Multiple Imputation , 2010 .

[45]  P. Bentler,et al.  Comparative fit indexes in structural models. , 1990, Psychological bulletin.

[46]  James L. Arbuckle,et al.  Full Information Estimation in the Presence of Incomplete Data , 1996 .

[47]  H. Joe,et al.  Limited-and Full-Information Estimation and Goodness-ofFit Testing in 2 n Contingency Tables : A Unified Framework , 2005 .

[48]  B. Muthén,et al.  How to Use a Monte Carlo Study to Decide on Sample Size and Determine Power , 2002 .

[49]  Frank B. Baker,et al.  Item Response Theory : Parameter Estimation Techniques, Second Edition , 2004 .

[50]  Albert Maydeu-Olivares,et al.  Understanding the Model Size Effect on SEM Fit Indices , 2018, Educational and psychological measurement.

[51]  David Kaplan,et al.  The Impact of BIB Spiraling-Induced Missing Data Patterns on Goodness-of-Fit Tests in Factor Analysis , 1995 .

[52]  C. Distefano,et al.  Examination of the Weighted Root Mean Square Residual: Evidence for Trustworthiness? , 2018 .

[53]  L. Tucker,et al.  A reliability coefficient for maximum likelihood factor analysis , 1973 .

[54]  Kaifeng Lu,et al.  Number of imputations needed to stabilize estimated treatment difference in longitudinal data analysis , 2017, Statistical methods in medical research.

[55]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[56]  Craig K. Enders,et al.  The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models , 2001 .

[57]  Bengt Muthén,et al.  Simple Second Order Chi-Square Correction , 2010 .

[58]  P. Bentler,et al.  Fit indices in covariance structure modeling : Sensitivity to underparameterized model misspecification , 1998 .

[59]  Xiao-Li Meng,et al.  Posterior Predictive $p$-Values , 1994 .

[60]  B. Muthén,et al.  A comparison of some methodologies for the factor analysis of non‐normal Likert variables: A note on the size of the model , 1992 .

[61]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[62]  Francisco José Abad,et al.  Are fit indices really fit to estimate the number of factors with categorical variables? Some cautionary findings via Monte Carlo simulation. , 2016, Psychological methods.

[63]  Donald B. Rubin,et al.  Performing likelihood ratio tests with multiply-imputed data sets , 1992 .

[64]  R. Terry,et al.  Revisiting the Model Size Effect in Structural Equation Modeling , 2018 .

[65]  Amanda J. Fairchild,et al.  Goodness of Fit in Item Factor Analysis: Effect of the Number of Response Alternatives , 2017 .

[66]  K. Jöreskog A general approach to confirmatory maximum likelihood factor analysis , 1969 .

[67]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[68]  Sabrina Eberhart,et al.  Applied Missing Data Analysis , 2016 .

[69]  Bengt Muthén,et al.  Bayesian Analysis Using Mplus: Technical Implementation , 2010 .