论文信息 - Sample Efficient Actor-Critic with Experience Replay

Sample Efficient Actor-Critic with Experience Replay

This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems. To achieve this, the paper introduces several innovations, including truncated importance sampling with bias correction, stochastic dueling network architectures, and a new trust region policy optimization method.

[1] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[2] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[3] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[4] Leslie Pack Kaelbling,et al. Off-Policy Policy Search , 2007 .

[5] Pawel Wawrzynski,et al. Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.

[6] Pieter Abbeel,et al. On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient , 2010, NIPS.

[7] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.

[8] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9] Sergey Levine,et al. Guided Policy Search , 2013, ICML.

[10] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[11] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[13] Regina Barzilay,et al. Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[15] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[16] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[17] Marc G. Bellemare,et al. Q(λ) with Off-Policy Corrections , 2016, ALT.

[18] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[19] Honglak Lee,et al. Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[20] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.