Oracle lower bounds for stochastic gradient sampling algorithms

We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^{d}$, and prove an information theoretic lower bound on the number of stochastic gradient queries of the log density needed. Several popular sampling algorithms (including many Markov chain Monte Carlo methods) operate by using stochastic gradients of the log density to generate a sample; our results establish an information theoretic limit for all these algorithms. We show that for every algorithm, there exists a well-conditioned strongly log-concave target density for which the distribution of points generated by the algorithm would be at least $\varepsilon$ away from the target in total variation distance if the number of gradient queries is less than $\Omega(\sigma^2 d/\varepsilon^2)$, where $\sigma^2 d$ is the variance of the stochastic gradient. Our lower bound follows by combining the ideas of Le Cam deficiency routinely used in the comparison of statistical experiments along with standard information theoretic tools used in lower bounding Bayes risk functions. To the best of our knowledge our results provide the first nontrivial dimension-dependent lower bound for this problem.

[1]  H. F. Bohnenblust,et al.  Reconnaissance in Game Theory , 1949 .

[2]  D. Blackwell Equivalent Comparisons of Experiments , 1953 .

[3]  L. L. Cam,et al.  Sufficiency and Approximate Sufficiency , 1964 .

[4]  D. Ermak A computer simulation of charged particles in solution. I. Technique and equilibrium properties , 1975 .

[5]  G. Parisi Correlation functions and computer simulations (II) , 1981 .

[6]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[7]  Robert L. Smith,et al.  Efficient Monte Carlo Procedures for Generating Points Uniformly Distributed over Bounded Regions , 1984, Oper. Res..

[8]  Martin E. Dyer,et al.  A random polynomial-time algorithm for approximating the volume of convex bodies , 1991, JACM.

[9]  Miklós Simonovits,et al.  The mixing rate of Markov chains, an isoperimetric inequality, and computing the volume , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[10]  Miklós Simonovits,et al.  Random Walks in a Convex Body and an Improved Volume Algorithm , 1993, Random Struct. Algorithms.

[11]  Robert L. Smith,et al.  Hit-and-Run Algorithms for Generating Multivariate Distributions , 1993, Math. Oper. Res..

[12]  Michael I. Miller,et al.  REPRESENTATIONS OF KNOWLEDGE IN COMPLEX SYSTEMS , 1994 .

[13]  Miklós Simonovits,et al.  Isoperimetric problems for convex bodies and a localization lemma , 1995, Discret. Comput. Geom..

[14]  R. Tweedie,et al.  Rates of convergence of the Hastings and Metropolis algorithms , 1996 .

[15]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[16]  R. Tweedie,et al.  Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms , 1996 .

[17]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[18]  László Lovász,et al.  Hit-and-run mixes fast , 1999, Math. Program..

[19]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[20]  G. Roberts,et al.  Langevin Diffusions and Metropolis-Hastings Algorithms , 2002 .

[21]  T. N. Sriram Asymptotics in Statistics–Some Basic Concepts , 2002 .

[22]  Santosh S. Vempala,et al.  Hit-and-run from a corner , 2004, STOC '04.

[23]  Santosh S. Vempala,et al.  Dispersion of Mass and the Complexity of Randomized Geometric Algorithms , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[24]  Santosh S. Vempala,et al.  The geometry of logconcave functions and sampling algorithms , 2007, Random Struct. Algorithms.

[25]  E. Vanden-Eijnden,et al.  Non-asymptotic mixing of the MALA algorithm , 2010, 1008.3514.

[26]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[27]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[28]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[29]  Maxim Raginsky,et al.  Information-Based Complexity, Feedback and Dynamics in Convex Programming , 2010, IEEE Transactions on Information Theory.

[30]  Martin J. Wainwright,et al.  Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.

[31]  Aarti Singh,et al.  Optimal rates for stochastic convex optimization under Tsybakov noise condition , 2013, ICML.

[32]  M. Girolami,et al.  Langevin diffusions and the Metropolis-adjusted Langevin algorithm , 2013, 1309.2983.

[33]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[34]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[35]  Marcelo Pereyra,et al.  Proximal Markov chain Monte Carlo algorithms , 2013, Statistics and Computing.

[36]  J. Eichel Comparison Of Statistical Experiments , 2016 .

[37]  Oren Mangoubi,et al.  Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions , 2017, 1708.07114.

[38]  Nisheeth K. Vishnoi,et al.  Dimensionally Tight Bounds for Second-Order Hamiltonian Monte Carlo , 2018, NeurIPS.

[39]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[40]  Peter L. Bartlett,et al.  Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[41]  Sébastien Bubeck,et al.  Sampling from a Log-Concave Distribution with Projected Langevin Monte Carlo , 2015, Discrete & Computational Geometry.

[42]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[43]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[44]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[45]  A. Eberle,et al.  Couplings and quantitative contraction rates for Langevin dynamics , 2017, The Annals of Probability.

[46]  Alain Durmus,et al.  Analysis of Langevin Monte Carlo via Convex Optimization , 2018, J. Mach. Learn. Res..

[47]  Kunal Talwar,et al.  Computational Separations between Sampling and Optimization , 2019, NeurIPS.

[48]  Rong Ge,et al.  Estimating normalizing constants for log-concave distributions: algorithms and lower bounds , 2019, STOC.

[49]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[50]  Aaron Smith,et al.  No Free Lunch for Approximate MCMC , 2020, 2010.12514.