Separating Latent Classes by Information Criteria

This study evaluates performance of information criteria used to separate latent classes. In the evaluations, various numbers of latent classes, sample sizes, parameter structures and latent-class complexities were designed to simulate datasets. The average accuracy rates of information criteria in selecting the designed numbers of latent classes were the core results in this experiment. The study revealed that widely used information criteria, e.g., AIC, BIC, CAIC, could perform poorly under some circumstances. By including a sample size adjustment (Rissanen, 1978), the unsatis-factory performances could be improved considerably. The sample size adjustment provides a plausible solution for separating latent classes. Guidelines are provided to help achieve optimum use of the model fit indices.

[1]  H. Akaike Statistical predictor identification , 1970 .

[2]  Jean-Yves Pitarakis,et al.  Lag length estimation in large dimensional systems , 2002 .

[3]  John D. Kalbfleisch,et al.  Penalized minimum‐distance estimates in finite mixture models , 1996 .

[4]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[5]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[6]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[7]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[8]  Martin L. Puterman,et al.  Mixed logistic regression models , 1998 .

[9]  K. Lange A gradient algorithm locally equivalent to the EM algorithm , 1995 .

[10]  M. Puterman,et al.  Mixed Poisson regression models with covariate dependent rates. , 1996, Biometrics.

[11]  B. G. Quinn,et al.  The determination of the order of an autoregression , 1979 .

[12]  Edward I. George,et al.  Bayesian Model Selection , 2006 .

[13]  G. Verbeke,et al.  A Linear Mixed-Effects Model with Heterogeneity in the Random-Effects Population , 1996 .

[14]  B. Everitt A Monte Carlo Investigation of the Likelihood Ratio Test for Number of Classes in Latent Class Analysis. , 1988, Multivariate behavioral research.

[15]  Daniel J. Bauer,et al.  Overextraction of latent trajectory classes: Much ado about nothing? Reply to Rindskopf (2003), Muthén (2003), and Cudeck and Henly (2003) , 2003 .

[16]  C. Clogg Latent Class Models , 1995 .

[17]  B. Everitt A Monte Carlo Investigation Of The Likelihood Ratio Test For The Number Of Components In A Mixture Of Normal Distributions. , 1981, Multivariate behavioral research.

[18]  P L Fidler,et al.  Goodness-of-Fit Testing for Latent Class Models. , 1993, Multivariate behavioral research.

[19]  David Draper,et al.  Assessment and Propagation of Model Uncertainty , 2011 .

[20]  Bengt Muthén,et al.  Latent variable modeling in heterogeneous populations , 1989 .

[21]  M. Woodroofe On Model Selection and the ARC Sine Laws , 1982 .

[22]  S. Sclove Application of model-selection criteria to some problems in multivariate analysis , 1987 .

[23]  M A Young,et al.  Operational Definitions of Schizophrenia What Do They Identify? , 1982, The Journal of nervous and mental disease.

[24]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[25]  A. Gelfand,et al.  Bayesian Model Choice: Asymptotics and Exact Calculations , 1994 .

[26]  D. Rubin,et al.  Estimation and Hypothesis Testing in Finite Mixture Models , 1985 .

[27]  H. Hartley Maximum Likelihood Estimation from Incomplete Data , 1958 .

[28]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[29]  A. Madansky Determinantal methods in latent class analysis , 1960 .

[30]  Chih-Chien Yang,et al.  Evaluating latent class analysis models in qualitative phenotype identification , 2006, Comput. Stat. Data Anal..

[31]  Yiu-Fai Yung,et al.  Finite mixtures in confirmatory factor-analysis models , 1997 .

[32]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[33]  G. Celeux,et al.  An entropy criterion for assessing the number of clusters in a mixture model , 1996 .

[34]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[35]  George B. Macready,et al.  A Simulation Study of the Difference Chi-Square Statistic for Comparing Latent Class Models Under Violation of Regularity Conditions , 1989 .

[36]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[37]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[38]  George B. Macready,et al.  Concomitant-Variable Latent-Class Models , 1988 .

[39]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[40]  A. Pulver,et al.  Extended latent class approach to the study of familial/sporadic forms of a disease: Its application to the study of the heterogeneity of schizophrenia , 1994, Genetic epidemiology.

[41]  A. Goldberger,et al.  Estimation of a Model with Multiple Indicators and Multiple Causes of a Single Latent Variable , 1975 .

[42]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[43]  Scott L. Zeger,et al.  Latent Variable Regression for Multiple Discrete Outcomes , 1997 .

[44]  M A Young,et al.  Evaluating diagnostic criteria: a latent class paradigm. , 1982, Journal of psychiatric research.

[45]  B. Muthén,et al.  Finite Mixture Modeling with Mixture Outcomes Using the EM Algorithm , 1999, Biometrics.

[46]  D. Binder Bayesian cluster analysis , 1978 .

[47]  Rick L. Andrews,et al.  A Comparison of Segment Retention Criteria for Finite Mixture Logit Models , 2003 .

[48]  C. Mitchell Dayton,et al.  Model Selection Information Criteria for Non-Nested Latent Class Models , 1997 .

[49]  D. Haughton On the Choice of a Model to Fit Data from an Exponential Family , 1988 .

[50]  M. Reiser,et al.  3. A Goodness-of-Fit Test for the Latent Class Model When Expected Frequencies are Small , 1999 .

[51]  D. Rindskopf,et al.  The value of latent class analysis in medical diagnosis. , 1986, Statistics in medicine.

[52]  Katherine E. Masyn,et al.  General growth mixture modeling for randomized preventive interventions. , 2001, Biostatistics.