On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems
暂无分享,去创建一个
[1] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..
[2] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[6] Vivek S. Borkar,et al. Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..
[7] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[8] E. Seneta. Non-negative Matrices and Markov Chains , 2008 .
[9] Dimitri P. Bertsekas,et al. Q-learning and policy iteration algorithms for stochastic shortest path problems , 2012, Annals of Operations Research.
[10] Huizhen Yu. Stochastic Shortest Path Games and Q-Learning , 2014, 1412.8570.