Nonparametric Identification and Estimation of Multivariate Mixtures

We study nonparametric identifiability of finite mixture models of k-variate data with M subpopulations, in which the components of the data vector are independent conditional on belonging to a subpopulation. We provide a sufficient condition for nonparametrically identifying M subpopulations when k>=3. Our focus is on the relationship between the number of values the components of the data vector can take on, and the number of identifiable subpopulations. Intuition would suggest that if the data vector can take many different values, then combining information from these different values helps identification. Hall and Zhou (2003) show, however, when k=2, two-component finite mixture models are not nonparametrically identifiable regardless of the number of the values the data vector can take. When k>=3, there emerges a link between the variation in the data vector, and the number of identifiable subpopulations: the number of identifiable subpopulations increases as the data vector takes on additional (different) values. This points to the possibility of identifying many components even when k=3, if the data vector has a continuously distributed element. Our identification method is constructive, and leads to an estimation strategy. It is not as efficient as the MLE, but can be used as the initial value of the optimization algorithm in computing the MLE. We also provide a sufficient condition for identifying the number of nonparametrically identifiable components, and develop a method for statistically testing and consistently estimating the number of nonparametrically identifiable components. We extend these procedures to develop a test for the number of components in binomial mixtures.

[1]  J. Henna On estimating of the number of constituents of a finite mixture of continuous distributions , 1985 .

[2]  A. Cohen,et al.  Finite Mixture Distributions , 1982 .

[3]  Thomas P. Hettmansperger,et al.  Semiparametric mixture models and repeated measures: the multinomial cut point model , 2004 .

[4]  James J. Heckman,et al.  Life Cycle Schooling and Dynamic Selection Bias: Models and Evidence for Five Cohorts of American Males , 1998, Journal of Political Economy.

[5]  T. N. Sriram,et al.  Robust Estimation of Mixture Complexity , 2006 .

[6]  E. Gassiat,et al.  Testing the order of a model using locally conic parametrization : population mixtures and stationary ARMA processes , 1999 .

[7]  W. Blischke Moment Estimators for the Parameters of a Mixture of Two Binomial Distributions , 1960 .

[8]  H. Kasahara,et al.  Nonparametric Identification of Finite Mixture Models of Dynamic Discrete Choices , 2009 .

[9]  P. Deb Finite Mixture Models , 2008 .

[10]  E. Gassiat,et al.  The estimation of the order of a mixture model , 1997 .

[11]  Leo A. Goodman,et al.  On the estimation of parameters in latent structure analysis , 1979 .

[12]  S. Yakowitz,et al.  On the Identifiability of Finite Mixtures , 1968 .

[13]  Ryan T. Elmore,et al.  Estimating Component Cumulative Distribution Functions in Finite Mixture Models , 2004 .

[14]  Stephen G. Donald,et al.  Inferring the rank of a matrix , 1997 .

[15]  Neil Henry Latent structure analysis , 1969 .

[16]  P. Hall,et al.  Nonparametric inference in multivariate mixtures , 2005 .

[17]  M. Keane,et al.  The Career Decisions of Young Men , 1997, Journal of Political Economy.

[18]  Thomas P. Hettmansperger,et al.  Almost nonparametric inference for repeated measures in mixture models , 2000 .

[19]  F. Krauss Latent Structure Analysis , 1980 .

[20]  John D. Kalbfleisch,et al.  Penalized minimum‐distance estimates in finite mixture models , 1996 .

[21]  Y. Shao,et al.  Asymptotics for likelihood ratio tests under loss of identifiability , 2003 .

[22]  J. Robin,et al.  TESTS OF RANK , 2000, Econometric Theory.

[23]  B. Leroux Consistent estimation of a mixing distribution , 1992 .

[24]  Paul F. Lazarsfeld,et al.  Latent Structure Analysis. , 1969 .

[25]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[26]  Stephen G. Donald,et al.  On the asymptotic properties of ldu-based tests of the rank of a matrix , 1996 .

[27]  N. Schork,et al.  On the asymmetry of biological frequency distributions , 1990, Genetic epidemiology.

[28]  A. Lewbel,et al.  Testing the Rank and Definiteness of Estimated Matrices with Applications to Factor, State-Space and ARMA Models , 1992 .

[29]  W. Gibson An extension of Anderson's solution for the latent structure equations , 1955 .

[30]  G. M. Tallis,et al.  Identifiability of mixtures , 1982, Journal of the Australian Mathematical Society. Series A. Pure Mathematics and Statistics.

[31]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[32]  Xiao-Hua Zhou,et al.  NONPARAMETRIC ESTIMATION OF COMPONENT DISTRIBUTIONS IN A MULTIVARIATE MIXTURE , 2003 .

[33]  W. Blischke Estimating the Parameters of Mixtures of Binomial Distributions , 1964 .

[34]  Adele Cutler,et al.  Information Ratios for Validating Mixture Analysis , 1992 .

[35]  K. Roeder,et al.  Residual diagnostics for mixture models , 1992 .

[36]  Xiao-Hua Zhou,et al.  Nonparametric Estimation of ROC Curves in the Absence of a Gold Standard , 2005, Biometrics.

[37]  K. Roeder A Graphical Technique for Determining the Number of Components in a Mixture of Normals , 1994 .

[38]  R. Paap,et al.  Generalized Reduced Rank Tests Using the Singular Value Decomposition , 2003 .

[39]  T. W. Anderson On estimation of parameters in latent structure analysis , 1954 .

[40]  Lancelot F. James,et al.  Consistent estimation of mixture complexity , 2001 .

[41]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .