Estimation of parameters in latent class models using fuzzy clustering algorithms

Abstract A mixture approach to clustering is an important technique in cluster analysis. A mixture of multivariate multinomial distributions is usually used to analyze categorical data with latent class model. The parameter estimation is an important step for a mixture distribution. Described here are four approaches to estimating the parameters of a mixture of multivariate multinomial distributions. The first approach is an extended maximum likelihood (ML) method. The second approach is based on the well-known expectation maximization (EM) algorithm. The third approach is the classification maximum likelihood (CML) algorithm. In this paper, we propose a new approach using the so-called fuzzy class model and then create the fuzzy classification maximum likelihood (FCML) approach for categorical data. The accuracy, robustness and effectiveness of these four types of algorithms for estimating the parameters of multivariate binomial mixtures are compared using real empirical data and samples drawn from the multivariate binomial mixtures of two classes. The results show that the proposed FCML algorithm presents better accuracy, robustness and effectiveness. Overall, the FCML algorithm has the superiority over the ML, EM and CML algorithms. Thus, we recommend FCML as another good tool for estimating the parameters of mixture multivariate multinomial models.

[1]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[2]  G. Molenberghs,et al.  A simple and fast alternative to the EM algorithm for incomplete categorical data and latent class models , 2001 .

[3]  W. Woodall,et al.  A probabilistic and statistical view of fuzzy methods , 1995 .

[4]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[5]  Miin-Shen Yang On a class of fuzzy classification maximum likelihood procedures , 1993 .

[6]  B. Kosko Fuzziness vs. probability , 1990 .

[7]  Miin-Shen Yang A survey of fuzzy clustering , 1993 .

[8]  Jian Yu,et al.  Optimality test for generalized FCM and its application to parameter selection , 2005, IEEE Transactions on Fuzzy Systems.

[9]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[10]  M. Wedel,et al.  Market Segmentation: Conceptual and Methodological Foundations , 1997 .

[11]  K S Cheng,et al.  Segmentation of multispectral magnetic resonance image using penalized fuzzy competitive learning network. , 1996, Computers and biomedical research, an international journal.

[12]  J. W. Getzels,et al.  Role conflict and personality. , 1955, Journal of personality.

[13]  Jeroen K. Vermunt,et al.  Cultural classifications under discussion latent class analysis of highbrow and lowbrow reading , 1999 .

[14]  Peter Bryant,et al.  Asymptotic behaviour of classification maximum likelihood estimates , 1978 .

[15]  B. Everitt,et al.  A Note on Parameter Estimation for Lazarsfeld's Latent Class Model using the EM Algorithm. , 1984, Multivariate behavioral research.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[18]  Miin-Shen Yang,et al.  Alternative c-means clustering algorithms , 2002, Pattern Recognit..

[19]  B. Green,et al.  A general solution for the latent class model of latent structure analysis. , 1951, Psychometrika.

[20]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[21]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[22]  C. Cooper,et al.  Latent class analysis applied to health behaviours , 1995 .

[23]  G. Celeux,et al.  Clustering criteria for discrete data and latent class models , 1991 .

[24]  Jzau-Sheng Lin,et al.  Vector quantization in DCT domain using fuzzy possibilistic c-means based on penalized and compensated constraints , 2002, Pattern Recognit..