Generalized Mixtures of Finite Mixtures and Telescoping Sampling

Within a Bayesian framework, a comprehensive investigation of the model class of mixtures of finite mixtures (MFMs) where a prior on the number of components is specified is performed. This model class has applications in model-based clustering as well as for semi-parametric density approximation, but requires suitable prior specifications and inference methods to exploit its full potential. We contribute to the Bayesian analysis of MFMs by considering a generalized class of MFMs containing static and dynamic MFMs where the Dirichlet parameter of the component weights either is fixed or depends on the number of components. We emphasize the distinction between the number of components $K$ of a mixture and the number of clusters $K_+$, i.e., the number of filled components given the data. In the MFM model, $K_+$ is a random variable and its prior depends on the prior on the number of components $K$ and the mixture weights. We characterize the prior on the number of clusters $K_+$ for generalized MFMs and derive computationally feasible formulas to calculate this implicit prior. In addition we propose a flexible prior distribution class for the number of components $K$ and link MFMs to Bayesian non-parametric mixtures. For posterior inference of a generalized MFM, we propose the novel telescoping sampler which allows Bayesian inference for mixtures with arbitrary component distributions without the need to resort to RJMCMC methods. The telescoping sampler explicitly samples the number of components, but otherwise requires only the usual MCMC steps for estimating a finite mixture model. The ease of its application using different component distributions is demonstrated on several data sets.

[1]  A. Lijoi,et al.  Models Beyond the Dirichlet Process , 2009 .

[2]  Matteo Ruggiero,et al.  Are Gibbs-Type Priors the Most Natural Generalization of the Dirichlet Process? , 2015, IEEE transactions on pattern analysis and machine intelligence.

[3]  P. Green,et al.  Modelling Heterogeneity With and Without the Dirichlet Process , 2001 .

[4]  Agostino Nobile,et al.  Bayesian finite mixtures with an unknown number of components: The allocation sampler , 2007, Stat. Comput..

[5]  T. Sweeting,et al.  Selecting the precision parameter prior in Dirichlet process mixture models , 2012 .

[6]  M. Iorio,et al.  Is infinity that far? A Bayesian nonparametric perspective of finite mixture models , 2019, The Annals of Statistics.

[7]  Sylvia Fruhwirth-Schnatter,et al.  How many data clusters are in the Galaxy data set? , 2021, Advances in Data Analysis and Classification.

[8]  Agostino Nobile,et al.  On the posterior distribution of the number of components in a finite mixture , 2004, math/0503673.

[9]  Stephen G. Walker,et al.  Slice sampling mixture models , 2011, Stat. Comput..

[10]  J. Pitman Exchangeable and partially exchangeable random partitions , 1995 .

[11]  Cristiano Villa,et al.  On a loss-based prior for the number of components in mixture models , 2018, Statistics & Probability Letters.

[12]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[13]  P. Green,et al.  Bayesian Model-Based Clustering Procedures , 2007 .

[14]  J. Pitman,et al.  Exchangeable Gibbs partitions and Stirling triangles , 2004, math/0412494.

[15]  Luca Scrucca,et al.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models , 2016, R J..

[16]  Gertraud Malsiner-Walli,et al.  Identifying Mixtures of Mixtures Using Bayesian Estimation , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[17]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[18]  M. Aitkin Likelihood and Bayesian analysis of mixtures , 2001 .

[19]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[20]  A. Gnedin A Species Sampling Model with Finitely Many Types , 2009, 0910.1988.

[21]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[22]  J. Pitman Some developments of the Blackwell-MacQueen urn scheme , 1996 .

[23]  Radford M. Neal,et al.  Splitting and merging components of a nonconjugate Dirichlet process mixture model , 2007 .

[24]  María José García-Zattera,et al.  A Dirichlet process mixture model for the analysis of correlated binary responses , 2007, Comput. Stat. Data Anal..

[25]  D. Rubin,et al.  STATISTICAL CHOICES IN INFANT TEMPERAMENT RESEARCH , 1994 .

[26]  A. Lijoi,et al.  AN ASYMPTOTIC ANALYSIS OF A CLASS OF DISCRETE NONPARAMETRIC PRIORS , 2013 .

[27]  Walter R. Gilks,et al.  Bayesian model comparison via jump diffusions , 1995 .

[28]  Matthew T. Harrison,et al.  A simple example of Dirichlet process mixture inconsistency for the number of components , 2013, NIPS.

[29]  K. Roeder Density estimation with confidence sets exemplified by superclusters and voids in the galaxies , 1990 .

[30]  Gertraud Malsiner-Walli,et al.  Model-based clustering based on sparse finite Gaussian mixtures , 2014, Statistics and Computing.

[31]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[32]  S. Frühwirth-Schnatter,et al.  Spying on the prior of the number of data clusters and the partition distribution in Bayesian cluster analysis , 2020, Australian & New Zealand Journal of Statistics.

[33]  R. Dorazio On selecting a prior for the precision parameter of Dirichlet process mixture models , 2009 .

[34]  Sylvia Frühwirth-Schnatter,et al.  Finite Mixture and Markov Switching Models , 2006 .

[35]  Anirban Bhattacharya,et al.  Probabilistic Community Detection With Unknown Number of Communities , 2016, Journal of the American Statistical Association.

[36]  P. McCullagh,et al.  How many clusters , 2008 .

[37]  H. Ishwaran,et al.  Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models , 2000 .

[38]  A. Cerquetti A new parametrization of the Gnedin-Fisher species sampling model , 2010, 1008.2285.

[39]  K. Mengersen,et al.  Asymptotic behaviour of the posterior distribution in overfitted mixture models , 2011 .

[40]  Gertraud Malsiner-Walli,et al.  From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering , 2018, Advances in Data Analysis and Classification.

[41]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[42]  Jeffrey W. Miller,et al.  Mixture Models With a Prior on the Number of Components , 2015, Journal of the American Statistical Association.

[43]  Fangzheng Xie,et al.  Bayesian Repulsive Gaussian Mixture Model , 2017, Journal of the American Statistical Association.

[44]  L. Wasserman,et al.  Practical Bayesian Density Estimation Using Mixtures of Normals , 1997 .

[45]  Joshua Zhexue Huang,et al.  A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining , 1997, DMKD.

[46]  Petros Dellaportas,et al.  Multivariate mixtures of normals with unknown number of components , 2006, Stat. Comput..

[47]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .