论文信息 - On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path SSP problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other condition that guarantees boundedness. We prove that the sequence of iterates is naturally bounded with probability one, thus furnishing the boundedness condition in the convergence proof by Tsitsiklis [Tsitsiklis JN 1994 Asynchronous stochastic approximation and Q-learning. Machine Learn. 16:185--202] and establishing completely the convergence of Q-learning for these SSP models.

Dimitri P. Bertsekas | Huizhen Yu | D. Bertsekas | Huizhen Yu

[1] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[2] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6] Vivek S. Borkar,et al. Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..

[7] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[8] E. Seneta. Non-negative Matrices and Markov Chains , 2008 .

[9] Dimitri P. Bertsekas,et al. Q-learning and policy iteration algorithms for stochastic shortest path problems , 2012, Annals of Operations Research.

[10] Huizhen Yu. Stochastic Shortest Path Games and Q-Learning , 2014, 1412.8570.