Using neural networks as function approximators in temporal difference reinforcement problems proved to be very effective in dealing with high-dimensionality of input state space, especially in more recent developments such as Deep Q-learning. These approaches share the use of a mechanism, called experience replay, that uniformly samples the previous experiences to a memory buffer to exploit them to re-learn, thus improving the efficiency of the learning process. In order to increase the learning performance, techniques such as prioritized experience and prioritized sampling have been introduced to deal with storing and replaying, respectively, the transitions with larger TD error. In this paper, we present a concept, called Attention-Based Experience REplay (ABERE), concerned with selective focusing of the replay buffer to specific types of experiences, therefore modeling the behavioral characteristics of the learning agent in a single and multi-agent environment. We further explore how different behavioral characteristics influence the performance of agents faced with dynamic environment that is able to become more hostile or benevolent by changing the relative probability to get positive or negative reinforcement.
[1]
Richard S. Sutton,et al.
Introduction to Reinforcement Learning
,
1998
.
[2]
Tom Schaul,et al.
Prioritized Experience Replay
,
2015,
ICLR.
[3]
M. D’Esposito.
Working memory.
,
2008,
Handbook of clinical neurology.
[4]
Joseph Kasof,et al.
Creativity and Breadth of Attention
,
1997
.
[5]
R. Engle.
Working Memory Capacity as Executive Attention
,
2002
.
[6]
Alex Graves,et al.
Playing Atari with Deep Reinforcement Learning
,
2013,
ArXiv.
[7]
M. Eysenck.
Attention And Arousal, Cognition And Performance
,
1982
.
[8]
Shane Legg,et al.
Human-level control through deep reinforcement learning
,
2015,
Nature.
[9]
Richard S. Sutton,et al.
Reinforcement Learning: An Introduction
,
1998,
IEEE Trans. Neural Networks.
[10]
Long-Ji Lin,et al.
Reinforcement learning for robots using neural networks
,
1992
.
[11]
Matthew D. Lieberman,et al.
Introversion and working memory: central executive differences
,
2000
.
[12]
Peng Zhang,et al.
Deep Q-Learning with Prioritized Sampling
,
2016,
ICONIP.