论文信息 - Episodic Memory Deep Q-Networks - 字舞流文

Episodic Memory Deep Q-Networks

Reinforcement learning (RL) algorithms have made huge progress in recent years by leveraging the power of deep neural networks (DNN). Despite the success, deep RL algorithms are known to be sample inefficient, often requiring many rounds of interaction with the environments to obtain satisfactory performance. Recently, episodic memory based RL has attracted attention due to its ability to latch on good actions quickly. In this paper, we present a simple yet effective biologically inspired RL algorithm called Episodic Memory Deep Q-Networks (EMDQN), which leverages episodic memory to supervise an agent during training. Experiments show that our proposed method can lead to better sample efficiency and is more likely to find good policies. It only requires 1/5 of the interactions of DQN to achieve many state-of-the-art performances on Atari games, significantly outperforming regular DQN and other episodic memory based RL algorithms.

Guangwen Yang | Tianqi Zhao | Lintao Zhang | Zichuan Lin | Zichuan Lin | Tianqi Zhao | Guangwen Yang | Lintao Zhang

[1] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .

[2] Kilian Q. Weinberger,et al. Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 , 2016 .

[3] Philippe Preux,et al. Recent Advances in Reinforcement Learning , 2008, Lecture Notes in Computer Science.

[4] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[5] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[6] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[7] Smruti Amarjyoti. Deep Reinforcement Learning for Robotic Manipulation - The state of the art , 2017, ArXiv.

[8] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[9] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[10] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[11] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.

[12] Marc G. Bellemare,et al. Q($\lambda$) with Off-Policy Corrections , 2016 .

[13] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14] Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.

[15] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[16] Demis Hassabis,et al. Neural Episodic Control , 2017, ICML.

[17] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[18] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[19] Joel Z. Leibo,et al. Model-Free Episodic Control , 2016, ArXiv.

[20] J. M. BoardmanAbstract,et al. Contemporary Mathematics , 2007 .

[21] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[22] Yang Liu,et al. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.

[23] BowlingMichael,et al. The arcade learning environment , 2013 .

[24] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[25] N. Daw,et al. Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework , 2017, Annual review of psychology.

[26] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[27] T. Robbins,et al. The hippocampal–striatal axis in learning, prediction and goal-directed behavior , 2011, Trends in Neurosciences.

[28] Daniel Gooch,et al. Communications of the ACM , 2011, XRDS.

[29] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[30] Peter Dayan,et al. Hippocampal Contributions to Control: The Third Way , 2007, NIPS.

[31] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.