Induced Exploration on Policy Gradients by Increasing Actor Entropy Using Advantage Target Regions
暂无分享,去创建一个
[1] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[2] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[3] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[4] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.
[5] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[6] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[8] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.
[9] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).
[10] Christopher Schulze,et al. ViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning, & Snapshot Ensembling , 2018, IntelliSys.
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.