Efficient Learning and Planning Within the Dyna Framework

Sutton's Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency. The benefit of using these strategies is demonstrated on some simple abstract learning tasks.

[1]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2]  E. Feigenbaum,et al.  Computers and Thought , 1963 .

[3]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[5]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[6]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[7]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[8]  L. Baird,et al.  A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING , 1990 .

[9]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[10]  Richard S. Sutton,et al.  Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.

[11]  Richard S. Sutton,et al.  Planning by Incremental Dynamic Programming , 1991, ML.

[12]  Andrew W. Moore,et al.  Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.

[13]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[14]  Andrew W. Moore,et al.  Memory-based Reinforcement Learning: Converging with Less Data and Less Real Time , 1993 .

[15]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..