A Survey of Deep Reinforcement Learning in Video Games

Deep reinforcement learning (DRL) has made great achievements since proposed. Generally, DRL agents receive high-dimensional inputs at each step, and make actions according to deep-neural-network-based policies. This learning mechanism updates the policy to maximize the return with an end-to-end method. In this paper, we survey the progress of DRL methods, including value-based, policy gradient, and model-based algorithms, and compare their main techniques and properties. Besides, DRL plays an important role in game artificial intelligence (AI). We also take a review of the achievements of DRL in various video games, including classical Arcade games, first-person perspective games and multi-agent real-time strategy games, from 2D to 3D, and from single-agent to multi-agent. A large number of video game AIs with DRL have achieved super-human performance, while there are still some challenges in this domain. Therefore, we also discuss some key points when applying DRL methods to this field, including exploration-exploitation, sample efficiency, generalization and transfer, multi-agent learning, imperfect information, and delayed spare rewards, as well as some research directions.

[1]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator , 2005 .

[2]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[3]  Yuandong Tian,et al.  Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.

[4]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[5]  Sepp Hochreiter,et al.  RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.

[6]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[7]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[8]  Yizhou Wang,et al.  Revisiting the Master-Slave Architecture in Multi-Agent Deep Reinforcement Learning , 2017, ArXiv.

[9]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[10]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents (Extended Abstract) , 2018, IJCAI.

[11]  Yang Gao,et al.  Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.

[12]  Qichao Zhang,et al.  Reinforcement Learning and Deep Learning based Lateral Control for Autonomous Driving , 2018, IEEE Comput. Intell. Mag..

[13]  Pushmeet Kohli,et al.  Value Propagation Networks , 2018, ICLR.

[14]  Razvan Pascanu,et al.  Relational Deep Reinforcement Learning , 2018, ArXiv.

[15]  Rina Dechter,et al.  Learning While Searching in Constraint-Satisfaction-Problems , 1986, AAAI.

[16]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[17]  Nannan Li,et al.  Learning Battles in ViZDoom via Deep Reinforcement Learning , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[18]  Alex Graves,et al.  Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[19]  Ruslan Salakhutdinov,et al.  Neural Map: Structured Memory for Deep Reinforcement Learning , 2017, ICLR.

[20]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[21]  Nando de Freitas,et al.  Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[22]  Nahum Shimkin,et al.  Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.

[23]  Pieter Abbeel,et al.  Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.

[24]  Marc G. Bellemare,et al.  The Reactor: A Sample-Efficient Actor-Critic Architecture , 2017, ArXiv.

[25]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[26]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[27]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[28]  Dongbin Zhao,et al.  Reinforcement Learning for Build-Order Production in StarCraft II , 2018, 2018 Eighth International Conference on Information Science and Technology (ICIST).

[29]  Nan Jiang,et al.  Hierarchical Imitation and Reinforcement Learning , 2018, ICML.

[30]  Hai Tao Wang,et al.  Review of deep reinforcement learning and discussions on the development of computer Go , 2016 .

[31]  Tang Zhen-tao,et al.  Recent progress of deep reinforcement learning : from AlphaGo to AlphaGo Zero , 2018 .

[32]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[33]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[34]  Bo Li,et al.  TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game , 2018, ArXiv.

[35]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[36]  Balaraman Ravindran,et al.  Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning , 2017, ICLR.

[37]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[38]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[39]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[40]  Marc G. Bellemare,et al.  Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.

[41]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[42]  Han Liu,et al.  Feedback-Based Tree Search for Reinforcement Learning , 2018, ICML.

[43]  Romain Laroche,et al.  Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.

[44]  Bing Liu,et al.  Action Permissibility in Deep Reinforcement Learning and Application to Autonomous Driving , 2018 .

[45]  Shie Mannor,et al.  Shallow Updates for Deep Reinforcement Learning , 2017, NIPS.

[46]  Nicolas Usunier,et al.  Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[47]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[48]  Trevor Darrell,et al.  Modular Architecture for StarCraft II with Deep Reinforcement Learning , 2018, AIIDE.

[49]  Tingwen Huang,et al.  A Review of Computational Intelligence for StarCraft AI , 2018, 2018 IEEE Symposium Series on Computational Intelligence (SSCI).

[50]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[51]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[52]  Tom Schaul,et al.  Unicorn: Continual Learning with a Universal, Off-policy Agent , 2018, ArXiv.

[53]  Arjun Chandra,et al.  Efficient Parallel Methods for Deep Reinforcement Learning , 2017, ArXiv.

[54]  Kurt Keutzer,et al.  Regret Minimization for Partially Observable Deep Reinforcement Learning , 2017, ICML.

[55]  Koray Kavukcuoglu,et al.  PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.

[56]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[57]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[58]  Dongbin Zhao,et al.  StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.

[59]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[60]  Yang Liu,et al.  Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.

[61]  Joel Z. Leibo,et al.  Model-Free Episodic Control , 2016, ArXiv.

[62]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[63]  Julian Togelius,et al.  Deep Learning for Video Game Playing , 2017, IEEE Transactions on Games.

[64]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[65]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[66]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[67]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[68]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[69]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[70]  Nando de Freitas,et al.  Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[71]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[72]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[73]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[74]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[75]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[76]  Florian Richoux,et al.  TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games , 2016, ArXiv.

[77]  John Schulman,et al.  Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[78]  Ivo D. Dinov,et al.  Deep learning for neural networks , 2018 .

[79]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[80]  Tong Lu,et al.  On Reinforcement Learning for Full-length Game of StarCraft , 2018, AAAI.

[81]  Marc G. Bellemare,et al.  Count-Based Exploration with Neural Density Models , 2017, ICML.

[82]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[83]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[84]  Stephen Tyree,et al.  Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[85]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[86]  Filip De Turck,et al.  #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[87]  Rémi Munos,et al.  Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[88]  Bo An,et al.  HogRider: Champion Agent of Microsoft Malmo Collaborative AI Challenge , 2018, AAAI.

[89]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[90]  Jeff Clune,et al.  Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems , 2018, ArXiv.

[91]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[92]  Dongbin Zhao,et al.  Driving Control with Deep and Reinforcement Learning in The Open Racing Car Simulator , 2018, ICONIP.

[93]  Marlos C. Machado,et al.  Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.

[94]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[95]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[96]  Julian Togelius,et al.  Deep Reinforcement Learning for General Video Game AI , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[97]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[98]  Shimon Whiteson,et al.  TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning , 2017, ICLR 2018.

[99]  David Silver,et al.  Learning values across many orders of magnitude , 2016, NIPS.