Generalization of Reinforcement Learners with Working and Episodic Memory

Memory is an important aspect of intelligence and plays a role in many deep reinforcement learning models. However, little progress has been made in understanding when specific memory systems help more than others and how well they generalize. The field also has yet to see a prevalent consistent and rigorous approach for evaluating agent performance on holdout data. In this paper, we aim to develop a comprehensive methodology to test different kinds of memory in an agent and assess how well the agent can apply what it learns in training to a holdout set that differs from the training set along dimensions that we suggest are relevant for evaluating memory-specific generalization. To that end, we first construct a diverse set of memory tasks that allow us to evaluate test-time generalization across multiple dimensions. Second, we develop and perform multiple ablations on an agent architecture that combines multiple memory systems, observe its baseline models, and investigate its performance against the task suite.

[1]  Razvan Pascanu,et al.  Vector-based navigation using grid-like representations in artificial agents , 2018, Nature.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Razvan Pascanu,et al.  Relational Deep Reinforcement Learning , 2018, ArXiv.

[4]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[5]  Zeb Kurth-Nelson,et al.  Been There, Done That: Meta-Learning with Episodic Recall , 2018, ICML.

[6]  Joel Z. Leibo,et al.  Unsupervised Predictive Memory in a Goal-Directed Agent , 2018, ArXiv.

[7]  Demis Hassabis,et al.  Neural Episodic Control , 2017, ICML.

[8]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[9]  Joel Z. Leibo,et al.  Model-Free Episodic Control , 2016, ArXiv.

[10]  A. Miyake,et al.  Models of Working Memory: Mechanisms of Active Maintenance and Executive Control , 1999 .

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[13]  Charles Blundell,et al.  Fast deep reinforcement learning using online adjustments from the past , 2018, NeurIPS.

[14]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[15]  Christopher Joseph Pal,et al.  Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding , 2018, NeurIPS.

[16]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[17]  Rémi Munos,et al.  Neural Predictive Belief Representations , 2018, ArXiv.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Marina Weber,et al.  Elements Of Episodic Memory , 2016 .

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[22]  Larry R Squire,et al.  Declarative Memory, Awareness, and Transitive Inference , 2005, The Journal of Neuroscience.

[23]  E. Tulving Episodic memory: from mind to brain. , 2002, Annual review of psychology.

[24]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[25]  Jason Weston,et al.  Weakly Supervised Memory Networks , 2015, ArXiv.

[26]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[27]  Shane Legg,et al.  DeepMind Lab , 2016, ArXiv.

[28]  Shane Legg,et al.  Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents , 2018, ArXiv.

[29]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[30]  Razvan Pascanu,et al.  Relational recurrent neural networks , 2018, NeurIPS.