论文信息 - Asynchronous stochastic approximation and Q-learning

Asynchronous stochastic approximation and Q-learning

Provides some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. The author then uses these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establishes its convergence under conditions more general than previously available.<<ETX>>

John N. Tsitsiklis | J. Tsitsiklis

[1] V. Nollau. Kushner, H. J./Clark, D. S., Stochastic Approximation Methods for Constrained and Unconstrained Systems. (Applied Mathematical Sciences 26). Berlin‐Heidelberg‐New York, Springer‐Verlag 1978. X, 261 S., 4 Abb., DM 26,40. US $ 13.20 , 1980 .

[2] Dimitri Bertsekas,et al. Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[3] John N. Tsitsiklis,et al. Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[4] Tamer Basar,et al. Asymptotic agreement and convergence of asynchronous stochastic algorithms , 1986, 1986 25th IEEE Conference on Decision and Control.

[5] H. Kushner,et al. Asymptotic properties of distributed and communication stochastic approximation algorithms , 1987 .

[6] H. Kushner,et al. Stochastic approximation algorithms for parallel and distributed processing , 1987 .

[7] C. Watkins. Learning from delayed rewards , 1989 .

[8] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[9] Richard S. Sutton,et al. Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.

[10] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[11] Andrew W. Moore,et al. Memory-based Reinforcement Learning: Converging with Less Data and Less Real Time , 1993 .

[12] J. Walrand,et al. Distributed Dynamic Programming , 2022 .