In the this paper, a CMAC-Q-learning based Dyna agent is presented to relieve the problem of learning speed in reinforcement learning, in order to achieve the goals of shortening training process and increasing the learning, speed. We combine CMAC, Q-learning, and prioritized sweeping techniques to construct the Dyna agent in which a Q-learning is trained for policy learning; meanwhile, model approximators, called CMAC-model and CMAC-R-model, are in charge of approximating the environment model. The approximated model provides the Q-learning with virtual interaction experience to further update the policy within the time gap when there is no interplay between the agent and the real environment. The Dyna agent switches seamlessly between the real environment and the virtual environment model for the objective of policy learning. A simulation for controlling a differential-drive mobile robot has been conducted to demonstrate that the proposed method can preliminarily achieve the design goal.
[1]
Peter Dayan,et al.
Technical Note: Q-Learning
,
2004,
Machine Learning.
[2]
James S. Albus,et al.
New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1
,
1975
.
[3]
C. Atkeson,et al.
Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real
,
1993
.
[4]
Richard S. Sutton,et al.
Dyna, an integrated architecture for learning, planning, and reacting
,
1990,
SGAR.
[5]
Peter Dayan,et al.
Q-learning
,
1992,
Machine Learning.
[6]
Richard S. Sutton,et al.
Reinforcement Learning: An Introduction
,
1998,
IEEE Trans. Neural Networks.
[7]
Andrew W. Moore,et al.
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time
,
1993,
Machine Learning.