Reinforcement Learning with Function Approximation Converges to a Region
暂无分享,去创建一个
[1] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[2] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[3] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[4] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[5] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[6] Richard S. Sutton,et al. Open Theoretical Questions in Reinforcement Learning , 1999, EuroCOLT.
[7] Benjamin Van Roy,et al. On the existence of fixed points for approximate value iteration and temporal-difference learning , 2000 .