论文信息 - Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning - 字舞流文

Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning

In this paper, we propose a new framework for training vision-based agent for First-Person Shooter (FPS) Game, in particular Doom. Our framework combines the state-of-the-art reinforcement learning approach (Asynchronous Advantage Actor-Critic (A3C) model [Mnih et al. (2016)]) with curriculum learning. Our model is simple in design and only uses game states from the AI side, rather than using opponents’ information [Lample & Chaplot (2016)]. On a known map, our agent won 10 out of the 11 attended games and the champion of Track1 in ViZDoom AI Competition 2016 by a large margin, 35% higher score than the second place.

Yuandong Tian | Yuxin Wu | Yuandong Tian | Yuxin Wu

[1] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[2] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[3] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[4] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[5] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[6] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[7] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[8] Sam Devlin,et al. An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems , 2011, Adv. Complex Syst..

[9] Robert Babuska,et al. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10] Shiguang Shan,et al. Self-Paced Curriculum Learning , 2015, AAAI.

[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[13] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[14] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[15] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[16] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[17] L. Citi,et al. Clyde: A Deep Reinforcement Learning DOOM Playing Agent , 2017, AAAI Workshops.

[18] Stephen Tyree,et al. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[19] Vladlen Koltun,et al. Learning to Act by Predicting the Future , 2016, ICLR.

[20] Guillaume Lample,et al. Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.