Accelerating Optimization and Reinforcement Learning with Quasi Stochastic Approximation

The ODE method has been a workhorse for algorithm design and analysis since the introduction of the stochastic approximation. It is now understood that convergence theory amounts to establishing robustness of Euler approximations for ODEs, while theory of rates of convergence requires finer analysis. This paper sets out to extend this theory to quasi-stochastic approximation, based on algorithms in which the "noise" is based on deterministic signals. The main results are obtained under minimal assumptions: the usual Lipschitz conditions for ODE vector fields, and it is assumed that there is a well defined linearization near the optimal parameter $\theta^*$, with Hurwitz linearization matrix $A^*$. The main contributions are summarized as follows: (i) If the algorithm gain is $a_t=g/(1+t)^\rho$ with $g>0$ and $\rho\in(0,1)$, then the rate of convergence of the algorithm is $1/t^\rho$. There is also a well defined "finite-$t$" approximation: \[ a_t^{-1}\{\Theta_t-\theta^*\}=\bar{Y}+\Xi^{\mathrm{I}}_t+o(1) \] where $\bar{Y}\in\mathbb{R}^d$ is a vector identified in the paper, and $\{\Xi^{\mathrm{I}}_t\}$ is bounded with zero temporal mean. (ii) With gain $a_t = g/(1+t)$ the results are not as sharp: the rate of convergence $1/t$ holds only if $I + g A^*$ is Hurwitz. (iii) Based on the Ruppert-Polyak averaging of stochastic approximation, one would expect that a convergence rate of $1/t$ can be obtained by averaging: \[ \Theta^{\text{RP}}_T=\frac{1}{T}\int_{0}^T \Theta_t\,dt \] where the estimates $\{\Theta_t\}$ are obtained using the gain in (i). The preceding sharp bounds imply that averaging results in $1/t$ convergence rate if and only if $\bar{Y}=\sf 0$. This condition holds if the noise is additive, but appears to fail in general. (iv) The theory is illustrated with applications to gradient-free optimization and policy gradient algorithms for reinforcement learning.

[1]  James C. Spall,et al.  A one-measurement form of simultaneous perturbation stochastic approximation , 1997, Autom..

[2]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[3]  Sean P. Meyn,et al.  Quasi-Stochastic Approximation and Off-Policy Reinforcement Learning , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[4]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[5]  Sean P. Meyn,et al.  Model-Free Primal-Dual Methods for Network Optimization with Application to Real-Time Optimal Power Flow , 2019, 2020 American Control Conference (ACC).

[6]  Shalabh Bhatnagar,et al.  A Generalization of the Borkar-Meyn Theorem for Stochastic Recursive Inclusions , 2015, Math. Oper. Res..

[7]  Sean P. Meyn,et al.  A Liapounov bound for solutions of the Poisson equation , 1996 .

[8]  Sean P. Meyn,et al.  Quasi stochastic approximation , 2011, Proceedings of the 2011 American Control Conference.

[9]  Robert D. Nowak,et al.  Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[10]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[11]  Ana Busic,et al.  Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation , 2020, AISTATS.

[12]  M. Métivier,et al.  Théorèmes de convergence presque sure pour une classe d'algorithmes stochastiques à pas décroissant , 1987 .

[13]  I. Mareels,et al.  Extremum seeking from 1922 to 2010 , 2010, Proceedings of the 29th Chinese Control Conference.

[14]  Ana Busic,et al.  Zap Q-Learning With Nonlinear Function Approximation , 2019, NeurIPS.

[15]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[16]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[17]  Michael C. Fu,et al.  Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences , 2003, TOMC.

[18]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[19]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[20]  J. Spall A Stochastic Approximation Technique for Generating Maximum Likelihood Parameter Estimates , 1987, 1987 American Control Conference.

[21]  Sean P. Meyn,et al.  Q-learning and Pontryagin's Minimum Principle , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[22]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .

[23]  Miroslav Krstic,et al.  Introduction to Extremum Seeking , 2012 .

[24]  Tamer Basar,et al.  Analysis of Recursive Stochastic Algorithms , 2001 .

[25]  Bernard Lapeybe,et al.  Sequences with low discrepancy generalisation and application to bobbins-monbo algorithm , 1990 .

[26]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[27]  Ioannis Kontoyiannis,et al.  The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning , 2021, ArXiv.

[28]  Sean P. Meyn,et al.  The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[29]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[30]  Shalabh Bhatnagar,et al.  Stability of Stochastic Approximations With “Controlled Markov” Noise and Temporal Difference Learning , 2015, IEEE Transactions on Automatic Control.

[31]  H. Robbins A Stochastic Approximation Method , 1951 .

[32]  Peter W. Glynn,et al.  Stochastic Simulation: Algorithms and Analysis , 2007 .

[33]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[34]  Hoi-To Wai,et al.  Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise , 2020, COLT.

[35]  M. Krstić,et al.  Real-Time Optimization by Extremum-Seeking Control , 2003 .

[36]  D. Ruppert,et al.  Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[37]  Gilles Pagès,et al.  Stochastic approximation with averaging innovation applied to Finance , 2010, Monte Carlo Methods Appl..

[38]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[39]  Sean P. Meyn,et al.  Optimal Rate of Convergence for Quasi-Stochastic Approximation. , 2019, 1903.07228.

[40]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[41]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.