PARAMETRIZED DEEP Q-NETWORKS LEARNING: PLAYING ONLINE BATTLE ARENA WITH DISCRETE-CONTINUOUS HYBRID ACTION SPACE
暂无分享,去创建一个
Han Liu | Yang Zheng | Ji Liu | Zhuoran Yang | Xiangru Lian | Carson Eisenach | Peng Sun | Qing Wang | Tong Zhang | Bei Peng | Haichuan Yang | Lei Han | Jiechao Xiong | Haobo Fu | Emmanuel Ekwedike | Haoyue Gao
[1] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[2] Philip H. S. Torr,et al. Playing Doom with SLAM-Augmented Deep Reinforcement Learning , 2016, ArXiv.
[3] H. Robbins. A Stochastic Approximation Method , 1951 .
[4] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.
[5] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[6] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[7] Julian Togelius,et al. Deep Learning for Video Game Playing , 2017, IEEE Transactions on Games.
[8] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[9] V. Borkar. Stochastic approximation with two time scales , 1997 .
[10] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[11] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[12] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[13] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[14] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[15] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[16] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[17] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[18] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.
[19] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[20] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[21] Pravesh Ranchod,et al. Reinforcement Learning with Parameterized Actions , 2015, AAAI.
[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[23] Guillaume Lample,et al. Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.
[24] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[25] David Silver,et al. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.
[26] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[27] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[28] Jun Wang,et al. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.
[29] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[30] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[31] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[32] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[33] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[34] Peter Stone,et al. Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.