Estimation of finite mixtures with symmetric components

It may sometimes be clear from background knowledge that a population under investigation proportionally consists of a known number of subpopulations, whose distributions belong to the same, yet unknown, family. While a parametric family is commonly used in practice, one can also consider some nonparametric families to avoid distributional misspecification. In this article, we propose a solution using a mixture-based nonparametric family for the component distribution in a finite mixture model as opposed to some recent research that utilizes a kernel-based approach. In particular, we present a semiparametric maximum likelihood estimation procedure for the model parameters and tackle the bandwidth parameter selection problem via some popular means for model selection. Empirical comparisons through simulation studies and three real data sets suggest that estimators based on our mixture-based approach are more efficient than those based on the kernel-based approach, in terms of both parameter estimation and overall density estimation.

[1]  B. Lindsay The Geometry of Mixture Likelihoods: A General Theory , 1983 .

[2]  Roger W. Johnson,et al.  Exploring Relationships in Body Dimensions , 2003 .

[3]  Bruce G. Lindsay,et al.  A review of semiparametric mixture models , 1995 .

[4]  David R. Hunter,et al.  mixtools: An R Package for Analyzing Mixture Models , 2009 .

[5]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[6]  Yong Wang,et al.  Density estimation using non-parametric and semi-parametric mixtures , 2012 .

[7]  Richard Charnigo,et al.  Semiparametric Mixtures of Generalized Exponential Families , 2007 .

[8]  B. Lindsay The Geometry of Mixture Likelihoods, Part II: The Exponential Family , 1983 .

[9]  L. Bordes,et al.  SEMIPARAMETRIC ESTIMATION OF A TWO-COMPONENT MIXTURE MODEL , 2006, math/0607812.

[10]  Laurent Bordes,et al.  A stochastic EM algorithm for a semiparametric mixture model , 2007, Comput. Stat. Data Anal..

[11]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[12]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[13]  A. Bowman An alternative method of cross-validation for the smoothing of density estimates , 1984 .

[14]  D. Hunter,et al.  mixtools: An R Package for Analyzing Mixture Models , 2009 .

[15]  N. Sugiura Further analysts of the data by akaike' s information criterion and the finite corrections , 1978 .

[16]  K. Roeder Density estimation with confidence sets exemplified by superclusters and voids in the galaxies , 1990 .

[17]  Yong Wang,et al.  Maximum likelihood computation for fitting semiparametric mixture models , 2010, Stat. Comput..

[18]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[19]  D. W. Scott,et al.  Biased and Unbiased Cross-Validation in Density Estimation , 1987 .

[20]  Padhraic Smyth,et al.  Model selection for probabilistic clustering using cross-validated likelihood , 2000, Stat. Comput..

[21]  N. Laird Nonparametric Maximum Likelihood Estimation of a Mixing Distribution , 1978 .

[22]  A. Goldman An Introduction to Regression Graphics , 1995 .

[23]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[24]  Mark J. van der Laan,et al.  Fitting of mixtures with unspecified number of components using cross validation distance estimate , 2003, Comput. Stat. Data Anal..

[25]  D. Hunter,et al.  Inference for mixtures of symmetric distributions , 2007, 0708.0499.

[26]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[27]  David R. Hunter,et al.  An EM-Like Algorithm for Semi- and Nonparametric Estimation in Multivariate Mixtures , 2009 .

[28]  Francesco Bartolucci Clustering Univariate Observations via Mixtures of Unimodal Normal Mixtures , 2005, J. Classif..

[29]  Sylvia Frühwirth-Schnatter,et al.  Finite Mixture and Markov Switching Models , 2006 .

[30]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[31]  Athanasios Kottas,et al.  Bayesian semiparametric modeling and inference with mixtures of symmetric distributions , 2012, Stat. Comput..

[32]  L. Hubert,et al.  Comparing partitions , 1985 .