A Diffusion Process Perspective on Posterior Contraction Rates for Parameters

We show that diffusion processes can be exploited to study the posterior contraction rates of parameters in Bayesian models. By treating the posterior distribution as a stationary distribution of a stochastic differential equation (SDE), posterior convergence rates can be established via control of the moments of the corresponding SDE. Our results depend on the structure of the population log-likelihood function, obtained in the limit of an infinite sample sample size, and stochastic perturbation bounds between the population and sample log-likelihood functions. When the population log-likelihood is strongly concave, we establish posterior convergence of a $d$-dimensional parameter at the optimal rate $(d/n)^{1/ 2}$. In the weakly concave setting, we show that the convergence rate is determined by the unique solution of a non-linear equation that arises from the interplay between the degree of weak concavity and the stochastic perturbation bounds. We illustrate this general theory by deriving posterior convergence rates for three concrete examples: Bayesian logistic regression models, Bayesian single index models, and over-specified Bayesian mixture models.

[1]  G. Lecu'e,et al.  Learning with semi-definite programming: statistical bounds based on fixed point analysis and excess risk curvature , 2020, J. Mach. Learn. Res..

[2]  Michael I. Jordan,et al.  Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.

[3]  Dmitrii Ostrovskii,et al.  Finite-sample Analysis of M-estimators using Self-concordance , 2018, 1810.06838.

[4]  S. Gadat,et al.  On the cost of Bayesian posterior mean strategy for log-concave models , 2020, 2010.06420.

[5]  Michael I. Jordan,et al.  Instability, Computational Efficiency and Statistical Accuracy , 2020, ArXiv.

[6]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[7]  Michael I. Jordan,et al.  Singularity, misspecification and the convergence rate of EM , 2018, The Annals of Statistics.

[8]  Yuxin Chen,et al.  Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution , 2017, Found. Comput. Math..

[9]  Michael I. Jordan,et al.  On Approximate Thompson Sampling with Langevin Algorithms , 2020, ICML.

[10]  Yining Wang,et al.  Convergence Rates of Latent Topic Models Under Relaxed Identifiability Conditions , 2017, Electronic Journal of Statistics.

[11]  Nhat Ho,et al.  Singularity Structures and Impacts on Parameter Estimation in Finite Mixtures of Distributions , 2016, SIAM J. Math. Data Sci..

[12]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[13]  Soumendu Sundar Mukherjee Weak convergence and empirical processes , 2019 .

[14]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[15]  Jessica Fuerst,et al.  Stochastic Differential Equations And Applications , 2016 .

[16]  Nhat Ho,et al.  Convergence rates of parameter estimation for some weakly identifiable finite mixtures , 2016 .

[17]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[18]  Yun Yang,et al.  Minimax-optimal nonparametric regression in high dimensions , 2014, 1401.7278.

[19]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[20]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[21]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[22]  Debdeep Pati,et al.  ANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES. , 2011, Annals of statistics.

[23]  M. Ledoux,et al.  Logarithmic Sobolev Inequalities , 2014 .

[24]  Chao Gao,et al.  Rate exact Bayesian adaptation with modified block priors , 2013, 1312.3937.

[25]  D. Dunson,et al.  Bayesian Manifold Regression , 2013, 1305.0617.

[26]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[27]  XuanLong Nguyen Borrowing strength in hierarchical Bayes: convergence of the Dirichlet base measure , 2013, ArXiv.

[28]  S. Ghosal,et al.  Adaptive Bayesian multivariate density estimation with Dirichlet mixtures , 2011, 1109.6406.

[29]  X. Nguyen Convergence of latent mixing measures in finite and infinite mixture models , 2011, 1109.3250.

[30]  V. Spokoiny Parametric estimation. Finite sample theory , 2011, 1111.3029.

[31]  Van Der Vaart,et al.  The Bernstein-Von-Mises theorem under misspecification , 2012 .

[32]  K. Mengersen,et al.  Asymptotic behaviour of the posterior distribution in overfitted mixture models , 2011 .

[33]  J. H. Zanten,et al.  Adaptive nonparametric Bayesian inference using location-scale mixture priors , 2010, 1211.2121.

[34]  S. Sharma,et al.  The Fokker-Planck Equation , 2010 .

[35]  J. Rousseau Rates of convergence for the posterior distributions of mixtures of betas and adaptive nonparamatric estimation of the density , 2010, 1001.1615.

[36]  D. Bakry,et al.  A simple proof of the Poincaré inequality for a large class of probability measures , 2008 .

[37]  R. Adamczak A tail inequality for suprema of unbounded empirical processes with applications to Markov chains , 2007, 0709.3110.

[38]  S. Walker,et al.  On rates of convergence for posterior distributions in infinite-dimensional models , 2007, 0708.1892.

[39]  A. V. D. Vaart,et al.  Posterior convergence rates of Dirichlet mixtures at smooth densities , 2007, 0708.1885.

[40]  A. V. D. Vaart,et al.  Misspecification in infinite-dimensional Bayesian statistics , 2006, math/0607023.

[41]  S. Walker New approaches to Bayesian consistency , 2004, math/0503672.

[42]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[43]  S. Walker On sufficient conditions for Bayesian consistency , 2003 .

[44]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[45]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[46]  Lancelot F. James,et al.  Bayesian Model Selection in Finite Mixtures by Marginal Density Decompositions , 2001 .

[47]  A. V. D. Vaart,et al.  Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities , 2001 .

[48]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[49]  L. Wasserman,et al.  Rates of convergence of posterior distributions , 2001 .

[50]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[51]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[52]  L. Wasserman,et al.  The consistency of posterior distributions in nonparametric problems , 1999 .

[53]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[54]  M. Talagrand Transportation cost for Gaussian and other product measures , 1996 .

[55]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[56]  Jiahua Chen Optimal Rate of Convergence for Finite Mixture Models , 1995 .

[57]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[58]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[59]  M. Yor,et al.  Continuous martingales and Brownian motion , 1990 .

[60]  Grace L. Yang,et al.  On Bayes Procedures , 1990 .

[61]  P. Hall,et al.  Optimal Rates of Convergence for Deconvolving a Density , 1988 .

[62]  S. Amari Asymptotic Theory of Estimation , 1985 .

[63]  D. Freedman On the Asymptotic Behavior of Bayes' Estimates in the Discrete Case , 1963 .