On posterior contraction of parameters and interpretability in Bayesian mixture modeling

We study posterior contraction behaviors for parameters of interest in the context of Bayesian mixture modeling, where the number of mixing components is unknown while the model itself may or may not be correctly specified. Two representative types of prior specification will be considered: one requires explicitly a prior distribution on the number of mixture components, while the other places a nonparametric prior on the space of mixing distributions. The former is shown to yield an optimal rate of posterior contraction on the model parameters under minimal conditions, while the latter can be utilized to consistently recover the unknown number of mixture components, with the help of a fast probabilistic post-processing procedure. We then turn the study of these Bayesian procedures to the realistic settings of model misspecification. It will be shown that the modeling choice of kernel density functions plays perhaps the most impactful roles in determining the posterior contraction rates in the misspecified situations. Drawing on concrete posterior contraction rates established in this paper we wish to highlight some aspects about the interesting tradeoffs between model expressiveness and interpretability that a statistical modeler must negotiate in the rich world of mixture modeling.

[1]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[2]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[3]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[4]  Albert Y. Lo,et al.  On a Class of Bayesian Nonparametric Estimates: I. Density Estimates , 1984 .

[5]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[6]  Cun-Hui Zhang Fourier Methods for Estimating Mixing Densities and Distributions , 1990 .

[7]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[8]  Jianqing Fan On the Optimal Rates of Convergence for Nonparametric Deconvolution Problems , 1991 .

[9]  Jayaran Sethuramant A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[10]  B. Leroux Consistent estimation of a mixing distribution , 1992 .

[11]  Jiahua Chen Optimal Rate of Convergence for Finite Mixture Models , 1995 .

[12]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[13]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[14]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[15]  E. Gassiat,et al.  The estimation of the order of a mixture model , 1997 .

[16]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[17]  J. Ghosh,et al.  POSTERIOR CONSISTENCY OF DIRICHLET MIXTURES IN DENSITY ESTIMATION , 1999 .

[18]  M. Stephens Bayesian analysis of mixture models with an unknown number of components- an alternative to reversible jump methods , 2000 .

[19]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[20]  L. Wasserman,et al.  Rates of convergence of posterior distributions , 2001 .

[21]  Lancelot F. James,et al.  Bayesian Model Selection in Finite Mixtures by Marginal Density Decompositions , 2001 .

[22]  A. V. D. Vaart,et al.  Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities , 2001 .

[23]  P. Green,et al.  Modelling Heterogeneity With and Without the Dirichlet Process , 2001 .

[24]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Judith Rousseau,et al.  Nonasymptotic Bounds for Bayesian Order Identification with Application to Mixtures , 2005 .

[27]  Te-Won Lee,et al.  On the multivariate Laplace distribution , 2006, IEEE Signal Processing Letters.

[28]  A. V. D. Vaart,et al.  Misspecification in infinite-dimensional Bayesian statistics , 2006, math/0607023.

[29]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[30]  A. Fearnside Bayesian analysis of finite mixture distributions using the allocation sampler , 2007 .

[31]  Agostino Nobile,et al.  Bayesian finite mixtures with an unknown number of components: The allocation sampler , 2007, Stat. Comput..

[32]  S. Walker,et al.  On rates of convergence for posterior distributions in infinite-dimensional models , 2007, 0708.1892.

[33]  Radford M. Neal,et al.  Splitting and merging components of a nonconjugate Dirichlet process mixture model , 2007 .

[34]  A. V. D. Vaart,et al.  Posterior convergence rates of Dirichlet mixtures at smooth densities , 2007, 0708.1885.

[35]  Michael,et al.  On a Class of Bayesian Nonparametric Estimates : I . Density Estimates , 2008 .

[36]  C. Villani Optimal Transport: Old and New , 2008 .

[37]  A. Gelfand,et al.  The Nested Dirichlet Process , 2008 .

[38]  Kerrie Mengersen,et al.  Mixtures: Estimation and Applications , 2011 .

[39]  K. Mengersen,et al.  Asymptotic behaviour of the posterior distribution in overfitted mixture models , 2011 .

[40]  XuanLong Nguyen,et al.  Posterior contraction of the population polytope in finite admixture models , 2012, ArXiv.

[41]  S. Ghosal,et al.  Adaptive Bayesian multivariate density estimation with Dirichlet mixtures , 2011, 1109.6406.

[42]  X. Nguyen Convergence of latent mixing measures in finite and infinite mixture models , 2011, 1109.3250.

[43]  Eyke Hüllermeier,et al.  On the bayes-optimality of F-measure maximizers , 2013, J. Mach. Learn. Res..

[44]  Qiaozhu Mei,et al.  Understanding the Limiting Factors of Topic Modeling via Posterior Contraction Analysis , 2014, ICML.

[45]  Matthew T. Harrison,et al.  Inconsistency of Pitman-Yor process mixtures for the number of components , 2013, J. Mach. Learn. Res..

[46]  Catia Scricciolo Adaptive Bayesian Density Estimation in $L^{p}$-metrics with Pitman-Yor or Normalized Inverse-Gaussian Process Kernel Mixtures , 2014 .

[47]  Nhat Ho,et al.  On strong identifiability and convergence rates of parameter estimation in finite mixtures , 2016 .

[48]  Aad van der Vaart,et al.  Posterior contraction rates for deconvolution of Dirichlet-Laplace mixtures , 2016 .

[49]  Y. Ritov,et al.  Robust estimation of mixing measures in finite mixture models , 2017, Bernoulli.

[50]  XuanLong Nguyen,et al.  Conic Scan-and-Cover algorithms for nonparametric topic modeling , 2017, NIPS.

[51]  Jeffrey W. Miller,et al.  Mixture Models With a Prior on the Number of Components , 2015, Journal of the American Statistical Association.

[52]  J. Kahn,et al.  Strong identifiability and optimal minimax rates for finite mixture estimation , 2018, The Annals of Statistics.

[53]  Fangzheng Xie,et al.  Bayesian Repulsive Gaussian Mixture Model , 2017, Journal of the American Statistical Association.

[54]  M. Steel,et al.  On choosing mixture components via non‐local priors , 2016, Journal of the Royal Statistical Society: Series B (Statistical Methodology).