论文信息 - PARAMETRIZED DEEP Q-NETWORKS LEARNING: PLAYING ONLINE BATTLE ARENA WITH DISCRETE-CONTINUOUS HYBRID ACTION SPACE - 字舞流文

PARAMETRIZED DEEP Q-NETWORKS LEARNING: PLAYING ONLINE BATTLE ARENA WITH DISCRETE-CONTINUOUS HYBRID ACTION SPACE

Most existing deep reinforcement learning (DRL) frameworks consider action spaces that are either discrete or continuous space. Motivated by the project of design Game AI for King of Glory (KOG), one the world’s most popular mobile game, we consider the scenario with the discrete-continuous hybrid action space. To directly apply existing DLR frameworks, existing approaches either approximate the hybrid space by a discrete set or relaxing it into a continuous set, which is usually less efficient and robust. In this paper, we propose a parametrized deep Q-network (P-DQN) farmework for the hybrid action space without approximation or relaxation. Our algorithm combines DQN and DDPG and can be viewed as an extension of the DQN to hybrid actions. The empirical study on the game KOG validates the efficiency and effectiveness of our method.

Han Liu | Yang Zheng | Ji Liu | Zhuoran Yang | Xiangru Lian | Carson Eisenach | Peng Sun | Qing Wang | Tong Zhang | Bei Peng | Haichuan Yang | Lei Han | Jiechao Xiong | Haobo Fu | Emmanuel Ekwedike | Haoyue Gao

[1] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[2] Philip H. S. Torr,et al. Playing Doom with SLAM-Augmented Deep Reinforcement Learning , 2016, ArXiv.

[3] H. Robbins. A Stochastic Approximation Method , 1951 .

[4] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.

[5] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[6] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[7] Julian Togelius,et al. Deep Learning for Video Game Playing , 2017, IEEE Transactions on Games.

[8] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[9] V. Borkar. Stochastic approximation with two time scales , 1997 .

[10] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[11] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[12] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[13] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.

[14] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[15] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[16] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[17] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[18] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[19] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[20] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[21] Pravesh Ranchod,et al. Reinforcement Learning with Parameterized Actions , 2015, AAAI.

[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[23] Guillaume Lample,et al. Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[24] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.

[25] David Silver,et al. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[26] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[28] Jun Wang,et al. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[29] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[30] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[31] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[32] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[33] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[34] Peter Stone,et al. Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.