Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing

We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior. This power posterior is known to be non-log-concave and multi-modal, which leads to exponential mixing times for some standard MCMC algorithms. We introduce and study the Reflected Metropolis-Hastings Random Walk (RMRW) algorithm for sampling. For symmetric two-component Gaussian mixtures, we prove that its mixing time is bounded as $d^{1.5}(d + \Vert \theta_{0} \Vert^2)^{4.5}$ as long as the sample size $n$ is of the order $d (d + \Vert \theta_{0} \Vert^2)$. Notably, this result requires no conditions on the separation of the two means. En route to proving this bound, we establish some new results of possible independent interest that allow for combining Poincar\'{e} inequalities for conditional and marginal densities.

[1]  J. Cheeger A lower bound for the smallest eigenvalue of the Laplacian , 1969 .

[2]  P. Buser A note on the isoperimetric constant , 1982 .

[3]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[4]  Miklós Simonovits,et al.  Random Walks in a Convex Body and an Improved Volume Algorithm , 1993, Random Struct. Algorithms.

[5]  M. Ledoux A simple analytic proof of an inequality by P. Buser , 1994 .

[6]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[7]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[8]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[9]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[10]  S. Bobkov Isoperimetric and Analytic Inequalities for Log-Concave Probability Measures , 1999 .

[11]  J. Ghosh,et al.  POSTERIOR CONSISTENCY OF DIRICHLET MIXTURES IN DENSITY ESTIMATION , 1999 .

[12]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[13]  M. Stephens Bayesian analysis of mixture models with an unknown number of components- an alternative to reversible jump methods , 2000 .

[14]  M. Stephens Dealing with label switching in mixture models , 2000 .

[15]  C. Robert,et al.  Computational and Inferential Difficulties with Mixture Posterior Distributions , 2000 .

[16]  Lancelot F. James,et al.  Bayesian Model Selection in Finite Mixtures by Marginal Density Decompositions , 2001 .

[17]  A. V. D. Vaart,et al.  Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities , 2001 .

[18]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[19]  Santosh S. Vempala,et al.  Logconcave functions: geometry and efficient sampling algorithms , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[20]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[21]  Galin L. Jones,et al.  Sufficient burn-in for Gibbs samplers for a hierarchical random effects model , 2004, math/0406454.

[22]  A. Bovier,et al.  Metastability in Reversible Diffusion Processes I: Sharp Asymptotics for Capacities and Exit Times , 2004 .

[23]  Ajay Jasra,et al.  Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling , 2005 .

[24]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[25]  Maria G. Reznikoff,et al.  A new criterion for the logarithmic Sobolev inequality and two applications , 2007 .

[26]  A. Belloni,et al.  On the Computational Complexity of MCMC-Based Estimators in Large Samples , 2007, 0704.2167.

[27]  A. V. D. Vaart,et al.  Posterior convergence rates of Dirichlet mixtures at smooth densities , 2007, 0708.1885.

[28]  A. Gelfand,et al.  The Nested Dirichlet Process , 2008 .

[29]  T. Lelièvre A general two-scale criteria for logarithmic Sobolev inequalities , 2009 .

[30]  Maria G. Westdickenberg,et al.  A two-scale approach to logarithmic Sobolev inequalities and the hydrodynamic limit , 2009 .

[31]  Santosh S. Vempala,et al.  Sampling s-Concave Functions: The Limit of Convexity Based Isoperimetry , 2009, APPROX-RANDOM.

[32]  S. Sharma,et al.  The Fokker-Planck Equation , 2010 .

[33]  Stephen G. Walker,et al.  Slice sampling mixture models , 2011, Stat. Comput..

[34]  J. Rosenthal,et al.  Convergence rate of Markov chain methods for genomic motif discovery , 2013, 1303.2814.

[35]  X. Nguyen Convergence of latent mixing measures in finite and infinite mixture models , 2011, 1109.3250.

[36]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[37]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[38]  Martin J. Wainwright,et al.  On the Computational Complexity of High-Dimensional Bayesian Variable Selection , 2015, ArXiv.

[39]  Gersende Fort,et al.  A Shrinkage-Thresholding Metropolis Adjusted Langevin Algorithm for Bayesian Variable Selection , 2013, IEEE Journal of Selected Topics in Signal Processing.

[40]  Arian Maleki,et al.  Global Analysis of Expectation Maximization for Mixtures of Two Gaussians , 2016, NIPS.

[41]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[42]  Santosh S. Vempala,et al.  Algorithmic Theory of ODEs and Sampling from Well-conditioned Logconcave Densities , 2018, ArXiv.

[43]  Jeffrey W. Miller,et al.  Mixture Models With a Prior on the Number of Components , 2015, Journal of the American Statistical Association.

[44]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[45]  Ohad Shamir,et al.  Global Non-convex Optimization with Discretized Diffusions , 2018, NeurIPS.

[46]  Nisheeth K. Vishnoi,et al.  Dimensionally Tight Running Time Bounds for Second-Order Hamiltonian Monte Carlo , 2018, ArXiv.

[47]  Andrej Risteski,et al.  Simulated Tempering Langevin Monte Carlo II: An Improved Proof using Soft Markov Chain Decomposition , 2018, ArXiv.

[48]  Andrej Risteski,et al.  Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo , 2017, NeurIPS.

[49]  Michael I. Jordan,et al.  Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[50]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[51]  Michael I. Jordan,et al.  Sampling can be faster than optimization , 2018, Proceedings of the National Academy of Sciences.

[52]  David B. Dunson,et al.  Robust Bayesian Inference via Coarsening , 2015, Journal of the American Statistical Association.

[53]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[54]  A. Bhattacharya,et al.  Bayesian fractional posteriors , 2016, The Annals of Statistics.

[55]  Nhat Ho,et al.  On posterior contraction of parameters and interpretability in Bayesian mixture modeling , 2019, Bernoulli.

[56]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[57]  A. Eberle,et al.  Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.