Deep reinforcement learning in a spatial navigation task: Multiple contexts and their representation

Deep learning has recently been combined with Qlearning (Mnih et al., 2015) to enable learning difficult tasks such as playing video games based only on visual input. Stable learning in the in the deep Q network (DQN) is facilitated by the use of memory replay, which means that previous experiences are stored and sampled from during an offline learning period. We evaluate the DQN’s ability to learn and retain multiple variations of a spatial navigation task in a virtual environment. Task variations are presented in visually distinct contexts by varying light conditions and environmental textures. Replay memory capacity is varied to measure its effect on task retention. The representations of multiple contexts learned by the DQN agents are analyzed and compared. We show that DQN agents learn a preference for common actions early on, irrespective of replay memory capacity. A limited replay memory causes agents to confuse state-values. Furthermore, we find that contexts are quickly forgotten as soon as corresponding experiences are no longer available in the replay memory.

[1]  D. T. Lee,et al.  Two algorithms for constructing a Delaunay triangulation , 1980, International Journal of Computer & Information Sciences.

[2]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[5]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[6]  N. White Reward: What Is It? How Can It Be Inferred from Behavior? , 2011 .

[7]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[8]  Jason S. Snyder,et al.  Complementary activation of hippocampal–cortical subregions and immature neurons following chronic training in single and multiple context versions of the water maze , 2012, Behavioural Brain Research.

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  R. Rescorla Pavlovian conditioning. It's not what you think it is. , 1988, The American psychologist.

[11]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[12]  M. Bouton Learning and Behavior: A Contemporary Synthesis , 2006 .

[13]  Carl D. Cheney,et al.  Behavior Analysis and Learning , 1998 .

[14]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[15]  R. Morris Spatial Localization Does Not Require the Presence of Local Cues , 1981 .

[16]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[17]  W. Schultz Neuronal Reward and Decision Signals: From Theories to Data. , 2015, Physiological reviews.

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Brendan McCane,et al.  Pseudo-Rehearsal: Achieving Deep Reinforcement Learning without Catastrophic Forgetting , 2018, Neurocomputing.

[20]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[21]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.