9. Multiple Imputation of Incomplete Categorical Data Using Latent Class Analysis

We propose using latent class analysis as an alternative to log-linear analysis for the multiple imputation of incomplete categorical data. Similar to log-linear models, latent class models can be used to describe complex association structures between the variables used in the imputation model. However, unlike log-linear models, latent class models can be used to build large imputation models containing more than a few categorical variables. To obtain imputations reflecting uncertainty about the unknown model parameters, we use a nonparametric bootstrap procedure as an alternative to the more common full Bayesian approach. The proposed multiple imputation method, which is implemented in Latent GOLD software for latent class analysis, is illustrated with two examples. In a simulated data example, we compare the new method to well-established methods such as maximum likelihood estimation with incomplete data and multiple imputation using a saturated log-linear model. This example shows that the proposed method yields unbiased parameter estimates and standard errors. The second example concerns an application using a typical social sciences data set. It contains 79 variables that are all included in the imputation model. The proposed method is especially useful for such large data sets because standard methods for dealing with missing data in categorical variables break down when the number of variables is so large.

[1]  José G. Dias,et al.  A bootstrap-based aggregate classifier for model-based clustering , 2008, Comput. Stat..

[2]  P. Deb Finite Mixture Models , 2008 .

[3]  Paul T. von Hippel,et al.  Regression with missing Ys: An improved strategy for analyzing multiply imputed data , 2007, 1605.01095.

[4]  Klaas Sijtsma,et al.  Multiple imputation for item scores when test data are factorially complex. , 2007, The British journal of mathematical and statistical psychology.

[5]  Leo A. Goodman,et al.  1. On the Assignment of Individuals to Latent Classes , 2007 .

[6]  van der Ark,et al.  Multiple Imputation of Item Scores in Test and Questionnaire Data, and Influence on Psychometric Results , 2007, Multivariate behavioral research.

[7]  Joseph L Schafer,et al.  Robustness of a multivariate normal approximation for imputation of incomplete binary data , 2007, Statistics in medicine.

[8]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[9]  Peter Hernon,et al.  The Sage Encyclopedia of Social Science Research Methods: Edited by Michael S. Lewis-Beck, Alan Bryman, and Tim Futing Liao. 3 vols. Thousand Oaks, CA: Sage Publications, 2004. 1305 pp. $450.00. ISBN 0-7619-2363-2 , 2004 .

[10]  Nicholas J. Horton,et al.  A Potential for Bias When Rounding in Multiple Imputation , 2003 .

[11]  Jeroen K. Vermunt,et al.  7. Multilevel Latent Class Models , 2003 .

[12]  Rick L. Andrews,et al.  A Comparison of Segment Retention Criteria for Finite Mixture Logit Models , 2003 .

[13]  Jay Magidson,et al.  Latent class models for classification , 2003, Comput. Stat. Data Anal..

[14]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[15]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[16]  G. King,et al.  Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation , 2001, American Political Science Review.

[17]  G. McLachlan,et al.  Finite Mixture Models , 2000, Wiley Series in Probability and Statistics.

[18]  K Sijtsma,et al.  Influence of Imputation and EM Methods on Factor Analysis when Item Nonresponse in Questionnaire Data is Nonignorable , 2000, Multivariate behavioral research.

[19]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[20]  C. Mitchell Dayton,et al.  Model Selection Information Criteria for Non-Nested Latent Class Models , 1997 .

[21]  D. Rubin,et al.  Handling “Don't Know” Survey Responses: The Case of the Slovenian Plebiscite , 1995 .

[22]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[23]  Subir Ghosh,et al.  Statistical Analysis With Missing Data , 1988 .

[24]  R. Sugden Multiple Imputation for Nonresponse in Surveys , 1988 .

[25]  C. N. Morris,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[26]  D. Rubin,et al.  Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse , 1986 .

[27]  C. Fuchs Maximum Likelihood Estimation and Model Selection in Contingency Tables with Missing Data , 1982 .

[28]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[31]  H. Akaike A new look at the statistical model identification , 1974 .

[32]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[33]  Drew A. Linzer,et al.  poLCA : Polytomous Variable Latent Class Analysis Version , 2007 .

[34]  W. Munsters,et al.  The traditional quantitative approach. Surveying cultural tourists: lessons from the ATLAS cultural tourism research project. , 2010 .

[35]  Changjiang Xu,et al.  Model Selection with Information Criteria , 2010 .

[36]  Jay Magidson,et al.  LG-Syntax user's guide: Manual for Latent GOLD 4.5 Syntax module , 2008 .

[37]  J. Vermunt,et al.  Latent class models in longitudinal research , 2007 .

[38]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[39]  J. Vermunt,et al.  Latent Gold 4.0 User's Guide , 2005 .

[40]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[41]  J. Vermunt Latent Class Models , 2004 .

[42]  José G. Dias Finite Mixture Models , 2004 .

[43]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[44]  Jay Magidson,et al.  Latent Class Factor and Cluster Models, Bi-Plots, and Related Graphical Displays , 2001 .

[45]  Martin Knott,et al.  Weighting for item non‐response in attitude scales by using latent variable models with covariates , 2000 .

[46]  M. Wedel,et al.  Finite mixture models. Review, applications and computer-intensive methods , 2000 .

[47]  Yang C. Yuan,et al.  Multiple Imputation for Missing Data: Concepts and New Development , 2000 .

[48]  S. van Buuren,et al.  Multivariate Imputation by Chained Equations : Mice V1.0 User's manual , 2000 .

[49]  J. Schafer,et al.  On the performance of multiple imputation for multivariate data with small sample size , 1999 .

[50]  Jeroen K. Vermunt,et al.  'EM: A general program for the analysis of categorical data 1 , 1997 .

[51]  Jeroen K. Vermunt,et al.  LEM: A general program for the analysis of categorical data. Users manual , 1997 .

[52]  Roderick J. A. Little,et al.  The NHANES III multiple imputation project , 1996 .

[53]  Roderick J. A. Little,et al.  A stimulation study to evaluate the performance of model-based multiple imputations in HCHS health examination surveys , 1995 .

[54]  H. Bozdogan Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix , 1993 .

[55]  F. V. D. Pol,et al.  MIXED MARKOV LATENT CLASS MODELS , 1990 .