Inference for multivariate normal mixtures

Multivariate normal mixtures provide a flexible model for high-dimensional data. They are widely used in statistical genetics, statistical finance, and other disciplines. Due to the unboundedness of the likelihood function, classical likelihood-based methods, which may have nice practical properties, are inconsistent. In this paper, we recommend a penalized likelihood method for estimating the mixing distribution. We show that the maximum penalized likelihood estimator is strongly consistent when the number of components has a known upper bound. We also explore a convenient EM-algorithm for computing the maximum penalized likelihood estimator. Extensive simulations are conducted to explore the effectiveness and the practical limitations of both the new method and the ratified maximum likelihood estimators. Guidelines are provided based on the simulation results.

[1]  Surajit Ray,et al.  The topography of multivariate normal mixtures , 2005 .

[2]  D B Allison,et al.  Mixture distributions in human genetics research , 1996, Statistical methods in medical research.

[3]  Xianming Tan,et al.  CONSISTENCY OF THE CONSTRAINED MAXIMUM LIKELIHOOD ESTIMATOR IN FINITE NORMAL MIXTURE MODELS , 2006 .

[4]  Shili Lin,et al.  On modeling locus heterogeneity using mixture distributions , 2004, BMC Genetics.

[5]  M. Vannucci,et al.  Bayesian Variable Selection in Clustering High-Dimensional Data , 2005 .

[6]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[7]  J. Kiefer,et al.  CONSISTENCY OF THE MAXIMUM LIKELIHOOD ESTIMATOR IN THE PRESENCE OF INFINITELY MANY INCIDENTAL PARAMETERS , 1956 .

[8]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[9]  P. Green On Use of the EM Algorithm for Penalized Likelihood Estimation , 1990 .

[10]  A. Wald Note on the Consistency of the Maximum Likelihood Estimate , 1949 .

[11]  Jiahua Chen,et al.  INFERENCE FOR NORMAL MIXTURES IN MEAN AND VARIANCE , 2008 .

[12]  Sylvia Frühwirth-Schnatter,et al.  Finite Mixture and Markov Switching Models , 2006 .

[13]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[14]  S. Ingrassia A likelihood-based constrained algorithm for multivariate normal mixture models , 2004 .

[15]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[16]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[17]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[18]  Shili Lin,et al.  Class discovery and classification of tumor samples using mixture modeling of gene expression data - a unified approach , 2004, Bioinform..

[19]  N. E. Day Estimating the components of a mixture of normal distributions , 1969 .

[20]  B. Lindsay,et al.  Multivariate Normal Mixtures: A Fast Consistent Method of Moments , 1993 .

[21]  A. Raftery,et al.  Variable Selection for Model-Based Clustering , 2006 .

[22]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[23]  P. Larsen Maximum Penalized Likelihood Estimation, Volume I, Density Estimation , 2004 .

[24]  R. Hathaway A Constrained Formulation of Maximum-Likelihood Estimation for Normal Mixture Distributions , 1985 .

[25]  P. Eggermont,et al.  Maximum penalized likelihood estimation , 2001 .