论文信息 - Combo-Action: Training Agent For FPS Game with Auxiliary Tasks

Combo-Action: Training Agent For FPS Game with Auxiliary Tasks

Deep reinforcement learning (DRL) has achieved surpassing human performance on Atari games, using raw pixels and rewards to learn everything. However, first-person-shooter (FPS) games in 3D environments contain higher levels of human concepts (enemy, weapon, spatial structure, etc.) and a large action space. In this paper, we explore a novel method which can plan on temporally-extended action sequences, which we refer as Combo-Action to compress the action space. We further train a deep recurrent Q-learning network model as a high-level controller, called supervisory network, to manage the Combo-Actions. Our method can be boosted with auxiliary tasks (enemy detection and depth prediction), which enable the agent to extract high-level concepts in the FPS games. Extensive experiments show that our method is efficient in training process and outperforms previous stateof-the-art approaches by a large margin. Ablation study experiments also indicate that our method can boost the performance of the FPS agent in a reasonable way.

[1] Guillaume Lample,et al. Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.

[4] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[5] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[8] Vladlen Koltun,et al. Learning to Act by Predicting the Future , 2016, ICLR.

[9] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[10] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[13] Pietro Perona,et al. Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Nicholas Roy,et al. PUMA: Planning Under Uncertainty with Macro-Actions , 2010, AAAI.

[15] Yuandong Tian,et al. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.

[16] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[17] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.

[18] Dumitru Erhan,et al. Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Deva Ramanan,et al. Expecting the Unexpected: Training Detectors for Unusual Pedestrians with Adversarial Imposters , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[21] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[22] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[23] Murray Shanahan,et al. Classifying Options for Deep Reinforcement Learning , 2016, ArXiv.

[24] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.