Neural Episodic Control

Deep reinforcement learning methods attain super-human performance in a wide range of environments. Such methods are grossly inefficient, often taking orders of magnitudes more data than humans to achieve reasonable performance. We propose Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them. Our agent uses a semi-tabular representation of the value function: a buffer of past experience containing slowly changing state representations and rapidly updated estimates of the value function. We show across a wide range of environments that our agent learns significantly faster than other state-of-the-art, general purpose deep reinforcement learning agents.

[1]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[2]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .

[3]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[4]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[5]  Thomas G. Dietterich Machine learning , 1996, CSUR.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[8]  Andrew W. Moore,et al.  Barycentric Interpolators for Continuous Space and Time Reinforcement Learning , 1998, NIPS.

[9]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[10]  Jürgen Schmidhuber,et al.  A robot that reinforcement-learns to identify and memorize important previous observations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[11]  Jing Peng,et al.  Incremental multi-step Q-learning , 2004, Machine Learning.

[12]  Martin A. Riedmiller,et al.  CBR for State Value Function Approximation in Reinforcement Learning , 2005, ICCBR.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[14]  Richard S. Sutton,et al.  Learning to Predict by the Methods of Temporal Differences , 1988, Machine Learning.

[15]  Peter Dayan,et al.  Hippocampal Contributions to Control: The Third Way , 2007, NIPS.

[16]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[17]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2014, ICLR.

[18]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[19]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[20]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[21]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[22]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[23]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[24]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[25]  Alex Graves,et al.  Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[26]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[27]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[28]  David Silver,et al.  Learning functions across many orders of magnitudes , 2016, ArXiv.

[29]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[30]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[31]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[32]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[33]  James L. McClelland,et al.  What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , 2016, Trends in Cognitive Sciences.

[34]  Marc G. Bellemare,et al.  Q($\lambda$) with Off-Policy Corrections , 2016 .

[35]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[36]  Joel Z. Leibo,et al.  Model-Free Episodic Control , 2016, ArXiv.

[37]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[38]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[39]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[40]  Geoffrey E. Hinton,et al.  Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[41]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[42]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[43]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[44]  Aurko Roy,et al.  Learning to Remember Rare Events , 2017, ICLR.

[45]  Yang Liu,et al.  Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.