Solving the problem of latent class selection

The correct selection of the number of classes represents a difficult step in classification methods because it can significantly affect interpretations and may lead to an incorrect analysis of the studied phenomenon. This problem arises in latent class analysis, which is a method of classifying individuals based on categorical data. Determining the number of classes constituting the profiles of a population is generally done by using a likelihood ratio test, but the use of this test is not correct theoretically. To solve this problem, we use the information criteria to select the number of classes. Several information criteria have been proposed for this purpose previously, but the difficulty of selection remains because these criteria are influenced by other parameters such as the number of classes and the sample size. In this paper, we conduct a comparative study of these information criteria and use a panorama of the best-adapted criteria according to the data for an exact selection of models.

[1]  Stephanie T. Lanza,et al.  Sensitivity and Specificity of Information Criteria , 2018, bioRxiv.

[2]  B. Everitt A Monte Carlo Investigation of the Likelihood Ratio Test for Number of Classes in Latent Class Analysis. , 1988, Multivariate behavioral research.

[3]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[4]  Genane Youness,et al.  Contributions à une méthodologie de comparaison de partitions , 2004 .

[5]  C. Clogg Latent Class Models , 1995 .

[6]  R. Tibshirani,et al.  The Covariance Inflation Criterion for Adaptive Model Selection , 1999 .

[7]  N. Sugiura Further analysts of the data by akaike' s information criterion and the finite corrections , 1978 .

[8]  John A. Nelder,et al.  Generalized Linear Models , 1989 .

[9]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[10]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[11]  Nils Lid Hjort,et al.  Model Selection and Model Averaging: Contents , 2008 .

[12]  B. Everitt A Monte Carlo Investigation Of The Likelihood Ratio Test For The Number Of Components In A Mixture Of Normal Distributions. , 1981, Multivariate behavioral research.

[13]  D. Rubin,et al.  Estimation and Hypothesis Testing in Finite Mixture Models , 1985 .

[14]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[15]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[16]  David Draper,et al.  Assessment and Propagation of Model Uncertainty , 2011 .

[17]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[18]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[19]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[20]  A. Atkinson A note on the generalized information criterion for choice of a model , 1980 .

[21]  N. Bingham,et al.  Generalised Linear Models , 2010 .

[22]  C. Mitchell Dayton,et al.  Model Selection Information Criteria for Non-Nested Latent Class Models , 1997 .

[23]  Nils Lid Hjort,et al.  Model Selection and Model Averaging , 2001 .

[24]  Robert H. Shumway,et al.  The model selection criterion AICu , 1997 .

[25]  S. Sclove Application of model-selection criteria to some problems in multivariate analysis , 1987 .

[26]  Murray Aitkin,et al.  Statistical Modelling of Data on Teaching Styles , 1981 .

[27]  B. G. Quinn,et al.  The determination of the order of an autoregression , 1979 .

[28]  Chih-Chien Yang,et al.  Evaluating latent class analysis models in qualitative phenotype identification , 2006, Comput. Stat. Data Anal..

[29]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[30]  Rick L. Andrews,et al.  A Comparison of Segment Retention Criteria for Finite Mixture Logit Models , 2003 .