Reinforcement learning in continuous time: advantage updating
暂无分享,去创建一个
[1] R. Bellman. Dynamic programming. , 1957, Science.
[2] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[3] L. Baird,et al. A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING , 1990 .
[4] L. C. Baird. Function minimization for dynamic programming using connectionist networks , 1992, [Proceedings] 1992 IEEE International Conference on Systems, Man, and Cybernetics.
[5] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.
[6] James S. Morgan,et al. A Hierarchical Network of Control Systems that Learn: Modeling Nervous System Function During Classical and Instrumental Conditioning , 1993, Adapt. Behav..
[7] A. Harry Klopf,et al. A Hierarchical Network of Provably Optimal Learning Control Systems: Extensions of the Associative Control Process (ACP) Network , 1993, Adapt. Behav..
[8] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
[9] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[10] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..