Nonparametric Identification of Multivariate Mixtures

This article analyzes the identifiability of k-variate, M-component finite mixture models in which each component distribution has independent marginals, including models in latent class analysis. Without making parametric assumptions on the component distributions, we investigate how one can identify the number of components and the component distributions from the distribution function of the observed data. We reveal an important link between the number of variables (k), the number of values each variable can take, and the number of identifiable components. A lower bound on the number of components (M) is nonparametrically identifiable if k >= 2, and the maximum identifiable number of components is determined by the number of different values each variable takes. When M is known, the mixing proportions and the component distributions are nonparametrically identified from matrices constructed from the distribution function of the data if (i) k >= 3, (ii) two of k variables take at least M different values, and (iii) these matrices satisfy some rank and eigenvalue conditions. For the unknown M case, we propose an algorithm that possibly identifies M and the component distributions from data. We discuss a condition for nonparametric identi fication and its observable implications. In case M cannot be identified, we use our identification condition to develop a procedure that consistently estimates a lower bound on the number of components by estimating the rank of a matrix constructed from the distribution function of observed variables.

[1]  T. Koopmans,et al.  The Identification of Structural Characteristics , 1950 .

[2]  T. Koopmans Statistical inference in dynamic economic models , 1951 .

[3]  T. W. Anderson On estimation of parameters in latent structure analysis , 1954 .

[4]  W. Gibson An extension of Anderson's solution for the latent structure equations , 1955 .

[5]  A. Madansky Determinantal methods in latent class analysis , 1960 .

[6]  W. Blischke Estimating the Parameters of Mixtures of Binomial Distributions , 1964 .

[7]  Paul F. Lazarsfeld,et al.  Latent Structure Analysis. , 1969 .

[8]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[9]  L. A. Goodman The Analysis of Systems of Qualitative Variables When Some of the Variables Are Unobservable. Part I-A Modified Latent Structure Approach , 1974, American Journal of Sociology.

[10]  J. Kruskal More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling , 1976 .

[11]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[12]  C. Clogg Latent Structure Models of Mobility , 1981, American Journal of Sociology.

[13]  B. Everitt,et al.  Finite Mixture Distributions , 1981 .

[14]  R. Mislevy Estimating latent distributions , 1984 .

[15]  J. Henna On estimating of the number of constituents of a finite mixture of continuous distributions , 1985 .

[16]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[17]  D. Andrews Asymptotic Results for Generalized Wald Tests , 1987, Econometric Theory.

[18]  P.G.M. Van der Heijden,et al.  The analysis of time-budgets with a latent time-budget model. , 1988 .

[19]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[20]  N. Schork,et al.  On the asymmetry of biological frequency distributions , 1990, Genetic epidemiology.

[21]  B. Lindsay,et al.  Semiparametric Estimation in the Rasch Model and Related Exponential Response Models, Including a Simple Latent Class Model for Item Analysis , 1991 .

[22]  Adele Cutler,et al.  Information Ratios for Validating Mixture Analysis , 1992 .

[23]  K. Roeder,et al.  Residual diagnostics for mixture models , 1992 .

[24]  B. Leroux Consistent estimation of a mixing distribution , 1992 .

[25]  Joel E. Cohen,et al.  Nonnegative ranks, decompositions, and factorizations of nonnegative matrices , 1993 .

[26]  K. Roeder A Graphical Technique for Determining the Number of Components in a Mixture of Normals , 1994 .

[27]  C. Clogg Latent Class Models , 1995 .

[28]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[29]  John D. Kalbfleisch,et al.  Penalized minimum‐distance estimates in finite mixture models , 1996 .

[30]  M. Keane,et al.  The Career Decisions of Young Men , 1997, Journal of Political Economy.

[31]  Helmut Lütkepohl,et al.  Modified wald tests under nonregular conditions , 1997 .

[32]  E. Gassiat,et al.  The estimation of the order of a mixture model , 1997 .

[33]  James J. Heckman,et al.  Life Cycle Schooling and Dynamic Selection Bias: Models and Evidence for Five Cohorts of American Males , 1998, Journal of Political Economy.

[34]  E. Gassiat,et al.  Testing the order of a model using locally conic parametrization : population mixtures and stationary ARMA processes , 1999 .

[35]  G. Wood Binomial mixtures: geometric estimation of the mixing distribution , 1999 .

[36]  J. Robin,et al.  TESTS OF RANK , 2000, Econometric Theory.

[37]  Thomas P. Hettmansperger,et al.  Almost nonparametric inference for repeated measures in mixture models , 2000 .

[38]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[39]  Lancelot F. James,et al.  Consistent estimation of mixture complexity , 2001 .

[40]  R. Paap,et al.  Generalized Reduced Rank Tests Using the Singular Value Decomposition , 2003 .

[41]  L. V. D. Ark,et al.  Some examples of latent budget analysis and its extensions , 2002 .

[42]  J. Hagenaars,et al.  Applied Latent Class Analysis , 2003 .

[43]  Xiao-Hua Zhou,et al.  NONPARAMETRIC ESTIMATION OF COMPONENT DISTRIBUTIONS IN A MULTIVARIATE MIXTURE , 2003 .

[44]  Ryan T. Elmore,et al.  Estimating Component Cumulative Distribution Functions in Finite Mixture Models , 2004 .

[45]  Thomas P. Hettmansperger,et al.  Semiparametric mixture models and repeated measures: the multinomial cut point model , 2004 .

[46]  Sophia Rabe-Hesketh,et al.  Generalized latent variable models: multilevel, longitudinal, and structural equation models , 2004 .

[47]  P. Hall,et al.  Nonparametric inference in multivariate mixtures , 2005 .

[48]  Xiao-Hua Zhou,et al.  Nonparametric Estimation of ROC Curves in the Absence of a Gold Standard , 2005, Biometrics.

[49]  T. N. Sriram,et al.  Robust Estimation of Mixture Complexity , 2006 .

[50]  Neil Henry Latent structure analysis , 1969 .

[51]  George A. F. Seber,et al.  A matrix handbook for statisticians , 2007 .

[52]  Matthew M. Lin,et al.  NONNEGATIVE RANK FACTORIZATION VIA RANK REDUCTION , 2008 .

[53]  H. Kasahara,et al.  Nonparametric Identification of Finite Mixture Models of Dynamic Discrete Choices , 2009 .

[54]  Stephen A. Vavasis,et al.  On the Complexity of Nonnegative Matrix Factorization , 2007, SIAM J. Optim..

[55]  C. Matias,et al.  Identifiability of parameters in latent structure models with many observed variables , 2008, 0809.5032.