Reinforcement Learning Methods for Continuous-Time Markov Decision Problems
暂无分享,去创建一个
[1] E. Denardo. CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .
[2] B. Hajek. Optimal control of two interacting service stations , 1982, 1982 21st IEEE Conference on Decision and Control.
[3] B. Hajek. Optimal control of two interacting service stations , 1982, CDC 1982.
[4] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[5] C. Watkins. Learning from delayed rewards , 1989 .
[6] John Moody,et al. Learning rate schedules for faster stochastic gradient search , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.
[7] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.
[8] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[9] P. Dayan,et al. TD(λ) converges with probability 1 , 2004, Machine Learning.
[10] Steven J. Bradtke,et al. Incremental dynamic programming for on-line adaptive optimal control , 1995 .
[11] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..