Efficient Learning and Planning Within the Dyna Framework
暂无分享,去创建一个
[1] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[2] C. Watkins. Learning from delayed rewards , 1989 .
[3] L. Baird,et al. A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING (cid:3) , 1990 .
[4] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[5] Richard S. Sutton,et al. Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.
[6] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.
[7] Andrew W. Moore,et al. Memory-based Reinforcement Learning: Converging with Less Data and Less Real Time , 1993 .