Efficient Learning and Planning Within the Dyna Framework
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] E. Feigenbaum,et al. Computers and Thought , 1963 .
[3] Nils J. Nilsson,et al. Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[4] John H. Holland,et al. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .
[5] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[6] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[7] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[8] L. Baird,et al. A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING (cid:3) , 1990 .
[9] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[10] Richard S. Sutton,et al. Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.
[11] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.
[12] Andrew W. Moore,et al. Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.
[13] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[14] Andrew W. Moore,et al. Memory-based Reinforcement Learning: Converging with Less Data and Less Real Time , 1993 .
[15] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..