Stochastic approximation with cone-contractive operators: Sharp 𝓁∞-bounds for Q-learning

Motivated by the study of Q-learning algorithms in reinforcement learning, we study a class of stochastic approximation procedures based on operators that satisfy monotonicity and quasicontractivity conditions with respect to an underlying cone. We prove a general sandwich relation on the iterate error at each time, and use it to derive non-asymptotic bounds on the error in terms of a cone-induced gauge norm. These results are derived within a deterministic framework, requiring no assumptions on the noise. We illustrate these general bounds in application to synchronous Q-learning for discounted Markov decision processes with discrete state-action spaces, in particular by deriving non-asymptotic bounds on the `∞-norm for a range of stepsizes. These results are the sharpest known to date, and we show via simulation that the dependence of our bounds cannot be improved in a worst-case sense. These results show that relative to model-based Q-iteration, the `∞-based sample complexity of Q-learning is suboptimal in terms of the discount factor γ.

[1]  Martin J. Wainwright,et al.  From Gauss to Kolmogorov: Localized Measures of Complexity for Ellipses , 2018, Electronic Journal of Statistics.

[2]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[3]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[4]  R. Tourky,et al.  Cones and duality , 2007 .

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Harold J. Kushner,et al.  Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.

[7]  Benjamin Recht,et al.  The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint , 2018, COLT.

[8]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[9]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[11]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[12]  D. Bertsekas,et al.  Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[14]  Michael I. Jordan,et al.  Is Q-learning Provably Efficient? , 2018, NeurIPS.

[15]  Stojan Radenovic,et al.  Author's Personal Copy Applied Mathematics Letters a Note on the Equivalence of Some Metric and Cone Metric Fixed Point Results , 2022 .

[16]  Csaba Szepesvári,et al.  The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.

[17]  Hilbert J. Kappen,et al.  Speedy Q-Learning , 2011, NIPS.

[18]  D. Ruppert,et al.  Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[19]  Yishay Mansour,et al.  Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[20]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[21]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[22]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[23]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[24]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[25]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[26]  Hilbert J. Kappen,et al.  On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.

[27]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .