论文信息 - Efficient Learning and Planning Within the Dyna Framework

Efficient Learning and Planning Within the Dyna Framework

The Dyna class of reinforcement learning architectures enables the creation of integrated learning, planning and reacting systems. A class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency is examined. The benefit of using these strategies is demonstrated on some simple abstract learning tasks. It is proposed that the backups to be performed in Dyna be prioritized in order to improve its efficiency. It is demonstrated with simple tasks that use some specific prioritizing schemes can lead to significant reductions in computational effort and corresponding improvements in learning performance.<<ETX>>

[1] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[2] C. Watkins. Learning from delayed rewards , 1989 .

[3] L. Baird,et al. A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING (cid:3) , 1990 .

[4] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[5] Richard S. Sutton,et al. Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.

[6] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.

[7] Andrew W. Moore,et al. Memory-based Reinforcement Learning: Converging with Less Data and Less Real Time , 1993 .