Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning

In this paper, we propose a new framework for training vision-based agent for First-Person Shooter (FPS) Game, in particular Doom. Our framework combines the state-of-the-art reinforcement learning approach (Asynchronous Advantage Actor-Critic (A3C) model [Mnih et al. (2016)]) with curriculum learning. Our model is simple in design and only uses game states from the AI side, rather than using opponents’ information [Lample & Chaplot (2016)]. On a known map, our agent won 10 out of the 11 attended games and the champion of Track1 in ViZDoom AI Competition 2016 by a large margin, 35% higher score than the second place.

[1]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[3]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[4]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[5]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[6]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[7]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[8]  Sam Devlin,et al.  An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems , 2011, Adv. Complex Syst..

[9]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Shiguang Shan,et al.  Self-Paced Curriculum Learning , 2015, AAAI.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[13]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[14]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[15]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[16]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[17]  Clyde: A Deep Reinforcement Learning DOOM Playing Agent , 2017, AAAI Workshops.

[18]  Stephen Tyree,et al.  Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[19]  Vladlen Koltun,et al.  Learning to Act by Predicting the Future , 2016, ICLR.

[20]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.