On Stochastic Gradient Langevin Dynamics with Dependent Data Streams: The Fully Nonconvex Case

We consider the problem of sampling from a target distribution, which is \emph {not necessarily logconcave}, in the context of empirical risk minimization and stochastic optimization as presented in Raginsky et al. (2017). Non-asymptotic analysis results are established in the $L^1$-Wasserstein distance for the behaviour of Stochastic Gradient Langevin Dynamics (SGLD) algorithms. We allow the estimation of gradients to be performed even in the presence of \emph{dependent} data streams. Our convergence estimates are sharper and \emph{uniform} in the number of iterations, in contrast to those in previous studies.

[1]  Gershon Wolansky,et al.  Optimal Transport , 2021 .

[2]  C. Villani Optimal Transport: Old and New , 2008 .

[3]  Nicholas G. Polson,et al.  Deep learning for finance: deep portfolios: J. B. HEATON, N. G. POLSON AND J. H. WITTE , 2017 .

[4]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[5]  A. Eberle,et al.  Quantitative Harris-type theorems for diffusions and McKean–Vlasov processes , 2016, Transactions of the American Mathematical Society.

[6]  Mateusz B. Majka,et al.  Nonasymptotic bounds for sampling algorithms without log-concavity , 2018, The Annals of Applied Probability.

[7]  L. Gerencsér On a class of mixing processes , 1989 .

[8]  V. Borkar Controlled diffusion processes , 2005, math/0511077.

[9]  Arnak S. Dalalyan,et al.  Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent , 2017, COLT.

[10]  M. Yor,et al.  Continuous martingales and Brownian motion , 1990 .

[11]  É. Moulines,et al.  The tamed unadjusted Langevin algorithm , 2017, Stochastic Processes and their Applications.

[12]  R. Cont Empirical properties of asset returns: stylized facts and statistical issues , 2001 .

[13]  Alʹbert Nikolaevich Shiri︠a︡ev,et al.  Statistics of random processes , 1977 .

[14]  Arnak S. Dalalyan,et al.  Sparse Regression Learning by Aggregation and Langevin Monte-Carlo , 2009, COLT.

[15]  Peter L. Bartlett,et al.  Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[16]  J. Neveu,et al.  Discrete Parameter Martingales , 1975 .

[17]  L. Arnold Stochastic Differential Equations: Theory and Applications , 1992 .

[18]  Gilles Pagès,et al.  Stochastic approximation with averaging innovation applied to Finance , 2010, Monte Carlo Methods Appl..

[19]  Xuerong Mao,et al.  Stochastic differential equations and their applications , 1997 .

[20]  P. Meyer,et al.  Probabilités et potentiel , 1966 .

[21]  A. Eberle Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[22]  Forecasting the Term Structure of Interest Rates with Dynamic Constrained Smoothing B-Splines , 2020, Journal of Risk and Financial Management.

[23]  A. Siegel,et al.  Parsimonious modeling of yield curves , 1987 .

[24]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[25]  '. Moulines,et al.  On stochastic gradient Langevin dynamics with dependent data streams in the logconcave case , 2018, Bernoulli.

[26]  Gilles Pagès,et al.  Optimal Split of Orders Across Liquidity Pools: A Stochastic Algorithm Approach , 2009, SIAM J. Financial Math..

[27]  Michael I. Jordan,et al.  Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[28]  Bart De Schutter,et al.  Forecasting spot electricity prices Deep learning approaches and empirical comparison of traditional algorithms , 2018 .

[29]  D. Vere-Jones Markov Chains , 1972, Nature.

[30]  M. Aschwanden Statistics of Random Processes , 2021, Biomedical Measurement Systems and Data Science.

[31]  Huy N. Chau,et al.  On fixed gain recursive estimators with discontinuity in the parameters , 2016, ESAIM: Probability and Statistics.

[32]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[33]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[34]  M. Rosenbaum,et al.  Volatility is rough , 2018 .

[35]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[36]  Ying Zhang,et al.  Higher order Langevin Monte Carlo algorithm , 2018, Electronic Journal of Statistics.

[37]  Jinghui Chen,et al.  Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[38]  Gilles Pagès,et al.  Optimal posting price of limit orders: learning by trading , 2011, Mathematics and Financial Economics.

[39]  A. Eberle,et al.  Couplings and quantitative contraction rates for Langevin dynamics , 2017, The Annals of Probability.

[40]  E. Lenglart,et al.  Relation de domination entre deux processus , 1977 .

[41]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[42]  Nageswara S. V. Rao,et al.  Function Estimation by Feedforward Sigmoidal Networks with Bounded Weights , 1996, Neural Processing Letters.