Finite-Time Convergence Rates of Nonlinear Two-Time-Scale Stochastic Approximation under Markovian Noise

We study the so-called two-time-scale stochastic approximation, a simulation-based approach for finding the roots of two coupled nonlinear operators. Our focus is to characterize its finite-time performance in a Markov setting, which often arises in stochastic control and reinforcement learning problems. In particular, we consider the scenario where the data in the method are generated by Markov processes, therefore, they are dependent. Such dependent data result to biased observations of the underlying operators. Under some fairly standard assumptions on the operators and the Markov processes, we provide a formula that characterizes the convergence rate of the mean square errors generated by the method to zero. Our result shows that the method achieves a convergence in expectation at a rate O(1/k), where k is the number of iterations. Our analysis is mainly motivated by the classic singular perturbation theory for studying the asymptotic convergence of two-time-scale systems, that is, we consider a Lyapunov function that carefully characterizes the coupling between the two iterates. In addition, we utilize the geometric mixing time of the underlying Markov process to handle the bias and dependence in the data. Our theoretical result complements for the existing literature, where the rate of nonlinear two-time-scale stochastic approximation under Markovian noise is unknown. 1 Nonlinear two-time-scale SA Stochastic approximation (SA), introduced by [1], is a simulation-based approach for finding the root (or fixed point) of some unknown operator F represented by the form of an expectation, i.e., F (x) = Eπ[F (x, ξ)], where ξ is some random variable with a distribution π. Specifically, this method seeks a point x⋆ such that F (x⋆) = 0 based on the noisy observations F (x; ξ). The iterate x is iteratively updated by moving along the direction of F (x; ξ) scaled by some step size. Through a careful choice of this step size, the “noise” induced by the random samples ξ can be averaged out across iterations, and the algorithm converges to x. SA has found broad applications in many areas including statistics, stochastic optimization, machine learning, and reinforcement learning [2, 3, 4]. In this paper, we consider the two-time-scale SA, a generalized variant of the classic SA, which is used to find the root of a system of two coupled nonlinear equations. Given two unknown operators F : Rd×Rd → R d and G : Rd × Rd → Rd represented by F (x, y) = Eπ[F (x, y; ξ)] and G(x, y) = Eπ[G(x, y; ξ)], we seek to find x⋆ and y⋆ such that { F (x, y) = 0 G(x, y) = 0. (1) *Thinh T. Doan is with the Bradley Department of Electrical and Computer Engineering, Virginia Tech, USA. Email: thinhdoan@vt.edu

[1]  R. Srikant,et al.  On the Convergence Rate of Distributed Gradient Methods for Finite-Sum Optimization under Communication Delays , 2017, Proc. ACM Meas. Anal. Comput. Syst..

[2]  Mikael Johansson,et al.  A Randomized Incremental Subgradient Method for Distributed Optimization in Networked Systems , 2009, SIAM J. Optim..

[3]  V. Climenhaga Markov chains and mixing times , 2013 .

[4]  A. Mokkadem,et al.  Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms , 2006, math/0610329.

[5]  Shie Mannor,et al.  Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning , 2017, COLT.

[6]  Thinh T. Doan,et al.  Linear Two-Time-Scale Stochastic Approximation A Finite-Time Analysis , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7]  B. Anderson,et al.  ROBUST IDENTIFICATION OF , 2005 .

[8]  Hoi-To Wai,et al.  Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise , 2020, COLT.

[9]  Junyu Zhang,et al.  A Stochastic Composite Gradient Method with Incremental Variance Reduction , 2019, NeurIPS.

[10]  R. Srikant,et al.  Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning , 2019, NeurIPS.

[11]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .

[12]  Zhaoran Wang,et al.  A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic , 2020, ArXiv.

[13]  D. Ruppert,et al.  Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[14]  Shalabh Bhatnagar,et al.  Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.

[15]  Ana Busic,et al.  Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation , 2020, AISTATS.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Thinh T. Doan,et al.  Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis , 2019 .

[18]  J. Tsitsiklis,et al.  Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.

[19]  Thinh T. Doan,et al.  Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation , 2019, SIAM J. Control. Optim..

[20]  Balázs Szörényi,et al.  A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound , 2019, AAAI.

[21]  Xian Wu,et al.  Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms , 2020, NeurIPS.

[22]  Thinh T. Doan,et al.  Distributed two-time-scale methods over clustered networks , 2020, 2021 American Control Conference (ACC).

[23]  R. Srikant,et al.  Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.

[24]  Vivek S. Borkar,et al.  An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..

[25]  Martin J. Wainwright,et al.  On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration , 2020, COLT.

[26]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[27]  Sajad Khodadadian,et al.  Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm , 2021 .

[28]  Angelia Nedic,et al.  Incremental Stochastic Subgradient Algorithms for Convex Optimization , 2008, SIAM J. Optim..

[29]  Wotao Yin,et al.  On Markov Chain Gradient Descent , 2018, NeurIPS.

[30]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[31]  Quanquan Gu,et al.  A Finite Time Analysis of Two Time-Scale Actor Critic Methods , 2020, NeurIPS.

[32]  Thinh T. Doan,et al.  Convergence Rates of Distributed Gradient Methods Under Random Quantization: A Stochastic Approximation Approach , 2021, IEEE Transactions on Automatic Control.

[33]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[34]  Thinh T. Doan,et al.  Nonlinear Two-Time-Scale Stochastic Approximation: Convergence and Finite-Time Performance , 2020, IEEE Transactions on Automatic Control.

[35]  Adam Wierman,et al.  Finite-Time Analysis of Asynchronous Stochastic Approximation and Q-Learning , 2020, COLT.

[36]  Mengdi Wang,et al.  Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.

[37]  Eric Moulines,et al.  Non-asymptotic Analysis of Biased Stochastic Approximation Scheme , 2019, COLT.

[38]  Lam M. Nguyen,et al.  Convergence Rates of Accelerated Markov Gradient Descent with Applications in Reinforcement Learning , 2020, 2002.02873.

[39]  Siva Theja Maguluri,et al.  Finite-Sample Analysis of Contractive Stochastic Approximation Using Smooth Convex Envelopes , 2020, NeurIPS.

[40]  H. Robbins A Stochastic Approximation Method , 1951 .

[41]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .