Quantitative Central Limit Theorems for Discrete Stochastic Processes

In this paper, we establish a generalization of the classical Central Limit Theorem for a family of stochastic processes that includes stochastic gradient descent and related gradient-based algorithms. Under certain regularity assumptions, we show that the iterates of these stochastic processes converge to an invariant distribution at a rate of $O\lrp{1/\sqrt{k}}$ where $k$ is the number of steps; this rate is provably tight.

[1]  D. Ruppert,et al.  Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[2]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[3]  C. Villani,et al.  Generalization of an Inequality by Talagrand and Links with the Logarithmic Sobolev Inequality , 2000 .

[4]  E. Rio Upper bounds for minimal distances in the central limit theorem , 2009 .

[5]  Ioannis Chatzigeorgiou,et al.  Bounds on the Lambert Function and Their Application to the Outage Analysis of User Cooperation , 2013, IEEE Communications Letters.

[6]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[7]  David M. Blei,et al.  A Variational Analysis of Stochastic Gradient Algorithms , 2016, ICML.

[8]  Thomas Bonis Rates in the Central Limit Theorem and diffusion approximation via Stein's Method , 2015, 1506.06966.

[9]  David M. Blei,et al.  Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..

[10]  Qiang Sun,et al.  Statistical Sparse Online Regression: A Diffusion Approximation Perspective , 2018, AISTATS.

[11]  Alex Zhai A high-dimensional CLT in W 2 distance with near optimal convergence rate , 2017 .

[12]  Michael I. Jordan,et al.  Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[13]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.