论文信息 - Neural Episodic Control

Neural Episodic Control

Deep reinforcement learning methods attain super-human performance in a wide range of environments. Such methods are grossly inefficient, often taking orders of magnitudes more data than humans to achieve reasonable performance. We propose Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them. Our agent uses a semi-tabular representation of the value function: a buffer of past experience containing slowly changing state representations and rapidly updated estimates of the value function. We show across a wide range of environments that our agent learns significantly faster than other state-of-the-art, general purpose deep reinforcement learning agents.

[1] Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.

[2] Geoffrey E. Hinton. Using fast weights to deblur old memories , 1987 .

[3] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[4] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[5] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[6] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[7] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[8] Andrew W. Moore,et al. Barycentric Interpolators for Continuous Space and Time Reinforcement Learning , 1998, NIPS.

[9] Alex M. Andrew,et al. Reinforcement Learning: : An Introduction , 1998 .

[10] Jürgen Schmidhuber,et al. A robot that reinforcement-learns to identify and memorize important previous observations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[11] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.

[12] Martin A. Riedmiller,et al. CBR for State Value Function Approximation in Reinforcement Learning , 2005, ICCBR.

[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[15] Peter Dayan,et al. Hippocampal Contributions to Control: The Third Way , 2007, NIPS.

[16] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[17] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[18] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.

[19] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[20] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[21] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[22] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[23] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[24] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[25] Alex Graves,et al. Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[26] Honglak Lee,et al. Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[27] Razvan Pascanu,et al. Progressive Neural Networks , 2016, ArXiv.

[28] David Silver,et al. Learning functions across many orders of magnitudes , 2016, ArXiv.

[29] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[30] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[31] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[32] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[33] James L. McClelland,et al. What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , 2016, Trends in Cognitive Sciences.

[34] Marc G. Bellemare,et al. Q($\lambda$) with Off-Policy Corrections , 2016 .

[35] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[36] Joel Z. Leibo,et al. Model-Free Episodic Control , 2016, ArXiv.

[37] Jason Weston,et al. Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[38] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[39] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[40] Geoffrey E. Hinton,et al. Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[41] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[42] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[43] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[44] Aurko Roy,et al. Learning to Remember Rare Events , 2017, ICLR.

[45] Yang Liu,et al. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.