Deep Learning for Video Game Playing

In this paper, we review recent deep learning advances in the context of how they have been applied to play different types of video games such as first-person shooters, arcade games, and real-time strategy games. We analyze the unique requirements that different game genres pose to a deep learning system and highlight important open challenges in the context of applying these machine learning methods to video games, such as general game playing, dealing with extremely large decision spaces and sparse rewards.

[1]  Regina Barzilay,et al.  Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[2]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[3]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[4]  Rémi Munos,et al.  Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.

[5]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[6]  Julian Togelius,et al.  Procedural Content Generation via Machine Learning (PCGML) , 2017, IEEE Transactions on Games.

[7]  Sebastian Risi,et al.  DLNE: A hybridization of deep learning and neuroevolution for visual control , 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG).

[8]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.

[9]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[10]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11]  Julian Togelius,et al.  Procedural Content Generation in Games , 2016, Computational Synthesis and Creative Systems.

[12]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[13]  Sebastian Risi,et al.  Learning macromanagement in starcraft from replays using deep learning , 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG).

[14]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[15]  Penelope Sweetser Emergence in Games , 2007 .

[16]  Kenneth O. Stanley,et al.  Exploiting Open-Endedness to Solve Problems Through the Search for Novelty , 2008, ALIFE.

[17]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[19]  Shane Legg,et al.  Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[20]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[21]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[22]  Shie Mannor,et al.  Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning , 2018, NeurIPS.

[23]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[24]  Jürgen Schmidhuber,et al.  Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.

[25]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[26]  Vladlen Koltun,et al.  Learning to Act by Predicting the Future , 2016, ICLR.

[27]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[28]  Bobby D. Bryant,et al.  Neurovisual Control in the Quake II Environment , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[29]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[30]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[31]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[32]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[33]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[34]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[35]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[36]  Florian Richoux,et al.  TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games , 2016, ArXiv.

[37]  Dileep George,et al.  Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[38]  Hector Muñoz-Avila,et al.  Learning and Game AI , 2013, Artificial and Computational Intelligence in Games.

[39]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents (Extended Abstract) , 2018, IJCAI.

[40]  Shimon Whiteson,et al.  Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks , 2016, ArXiv.

[41]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[42]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[43]  Boyang Li,et al.  Game Engine Learning from Video , 2017, IJCAI.

[44]  S.M. Lucas,et al.  Evolutionary computation and games , 2006, IEEE Computational Intelligence Magazine.

[45]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[46]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[47]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[48]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[49]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[50]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[51]  Risto Miikkulainen,et al.  Real-time neuroevolution in the NERO video game , 2005, IEEE Transactions on Evolutionary Computation.

[52]  Matthew Hausknecht and Peter Stone On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning , 2016 .

[53]  Julian Togelius,et al.  Imitating human playing styles in Super Mario Bros , 2013, Entertain. Comput..

[54]  Yuandong Tian,et al.  Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.

[55]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[56]  Y. L. Cun,et al.  Modèles connexionnistes de l'apprentissage , 1987 .

[57]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[58]  Santiago Ontañón,et al.  The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games , 2013, AIIDE.

[59]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[60]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[61]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[62]  Peter Stone,et al.  Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[63]  Julian Togelius,et al.  Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation , 2018, 1806.10729.

[64]  Julian Togelius,et al.  EvoCommander: A Novel Game Based on Evolving and Switching Between Artificial Brains , 2017, IEEE Transactions on Computational Intelligence and AI in Games.

[65]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[66]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[67]  Malcolm I. Heywood,et al.  Multi-task learning in Atari video games with emergent tangled program graphs , 2017, GECCO.

[68]  Julian Togelius,et al.  Generative agents for player decision modeling in games , 2014, FDG.

[69]  Peter Stone,et al.  Keepaway Soccer: A Machine Learning Testbed , 2001, RoboCup.

[70]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[71]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[72]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[73]  Stefan Freyr Gudmundsson,et al.  Human-Like Playtesting with Deep Learning , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[74]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[75]  Julian Togelius,et al.  Portfolio Online Evolution in StarCraft , 2016, AIIDE.

[76]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[77]  Joel Lehman,et al.  Combining Search-Based Procedural Content Generation and Social Gaming in the Petalz Video Game , 2012, AIIDE.

[78]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[79]  Shane Legg,et al.  Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[80]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[81]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator , 2005 .

[82]  Tom Schaul,et al.  A video game description language for model-based or interactive learning , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[83]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[84]  David Wingate,et al.  What Can You Do with a Rock? Affordance Extraction via Word Embeddings , 2017, IJCAI.

[85]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[86]  Romain Laroche,et al.  Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.

[87]  Nicolas Usunier,et al.  Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[88]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[89]  Michael Buro,et al.  Combining Strategic Learning with Tactical Search in Real-Time Strategy Games , 2017, AIIDE.

[90]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[91]  Risto Miikkulainen,et al.  Computational Intelligence in Games , 2006 .

[92]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[93]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[94]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[95]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[96]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[97]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[98]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[99]  Patrick M. Pilarski,et al.  Model-Free reinforcement learning with continuous action in practice , 2012, 2012 American Control Conference (ACC).

[100]  Michael Buro,et al.  Evaluating real-time strategy game states using convolutional neural networks , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[101]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[102]  Shai Rozenberg,et al.  Playing SNES in the Retro Learning Environment , 2016, ICLR.

[103]  Julian Togelius,et al.  Ontogenetic and Phylogenetic Reinforcement Learning , 2009, Künstliche Intell..

[104]  Rob Fergus,et al.  MazeBase: A Sandbox for Learning from Games , 2015, ArXiv.

[105]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[106]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[107]  Simon M. Lucas,et al.  General Video Game AI: Learning from screen capture , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[108]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[109]  Philip H. S. Torr,et al.  Playing Doom with SLAM-Augmented Deep Reinforcement Learning , 2016, ArXiv.

[110]  Marc G. Bellemare,et al.  Count-Based Exploration with Neural Density Models , 2017, ICML.

[111]  Daniele Loiacono,et al.  Player Modeling , 2013, Artificial and Computational Intelligence in Games.

[112]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[113]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[114]  Philip Hingston,et al.  A new design for a Turing Test for Bots , 2010, Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games.

[115]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[116]  Wei Xu,et al.  A Deep Compositional Framework for Human-like Language Acquisition in Virtual Environment , 2017, ArXiv.

[117]  Devendra Singh Chaplot Transfer Deep Reinforcement Learning in 3 D Environments : An Empirical Study , 2016 .

[118]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[119]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[120]  Geoffrey E. Hinton,et al.  A general framework for parallel distributed processing , 1986 .

[121]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[122]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[123]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[124]  Jakub Kowalski,et al.  Text-based adventures of the golovin AI agent , 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG).

[125]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[126]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[127]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[128]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[129]  Julian Togelius,et al.  Online Evolution for Multi-action Adversarial Games , 2016, EvoApplications.

[130]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[131]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[132]  Julian Togelius,et al.  Autoencoder-augmented neuroevolution for visual doom playing , 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG).

[133]  Julian Togelius,et al.  The turing test track of the 2012 Mario AI Championship: Entries and evaluation , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[134]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[135]  Darryl Charles,et al.  Machine learning in digital games: a survey , 2008, Artificial Intelligence Review.

[136]  Leslie Pack Kaelbling,et al.  All learning is Local: Multi-agent Learning in Global Reward Games , 2003, NIPS.

[137]  Hiroaki Kitano,et al.  Overview of RoboCup-98 , 2000, AI Mag..

[138]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[139]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[140]  Kenneth O. Stanley,et al.  Automatic Content Generation in the Galactic Arms Race Video Game , 2009, IEEE Transactions on Computational Intelligence and AI in Games.

[141]  Kenneth O. Stanley,et al.  Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.

[142]  Sebastian Risi,et al.  Continual online evolutionary planning for in-game build order adaptation in StarCraft , 2017, GECCO.

[143]  Misha Denil,et al.  Deep Apprenticeship Learning for Playing Video Games , 2015, AAAI Workshop: Learning for General Competency in Video Games.

[144]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[145]  Bo Li,et al.  TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game , 2018, ArXiv.

[146]  Matthew J. Hausknecht,et al.  TextWorld: A Learning Environment for Text-based Games , 2018, CGW@IJCAI.

[147]  Nando de Freitas,et al.  Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[148]  Trevor Darrell,et al.  Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.

[149]  Dongbin Zhao,et al.  Reinforcement Learning for Build-Order Production in StarCraft II , 2018, 2018 Eighth International Conference on Information Science and Technology (ICIST).

[150]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[151]  Julian Togelius,et al.  Neuroevolution in Games: State of the Art and Open Challenges , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[152]  Julian Togelius,et al.  Deep Reinforcement Learning for General Video Game AI , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[153]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[154]  Matthew Hausknecht and Peter Stone,et al.  Half Field Offense: An Environment for Multiagent Learning and Ad Hoc Teamwork , 2016 .

[155]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[156]  Julian Togelius,et al.  A Panorama of Artificial and Computational Intelligence in Games , 2015, IEEE Transactions on Computational Intelligence and AI in Games.

[157]  P. Hingston Believable Bots: Can Computers Play Like People? , 2012 .

[158]  Risto Miikkulainen,et al.  A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[159]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[160]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[161]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[162]  Risto Miikkulainen,et al.  UT2: Human-like behavior via neuroevolution of combat behavior and replay of human traces , 2011, CIG.

[163]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[164]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[165]  Yuandong Tian,et al.  ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[166]  Chris Sauer,et al.  Beating Atari with Natural Language Guided Reinforcement Learning , 2017, ArXiv.

[167]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[168]  Shane Legg,et al.  DeepMind Lab , 2016, ArXiv.

[169]  Julian Togelius,et al.  Measuring Intelligence through Games , 2011, ArXiv.

[170]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[171]  Lakhmi C. Jain,et al.  Computational Intelligence in Games , 2005, IEEE Transactions on Neural Networks.

[172]  Simon M. Lucas,et al.  Evolving mario levels in the latent space of a deep convolutional generative adversarial network , 2018, GECCO.

[173]  Julian Togelius,et al.  Artificial Intelligence and Games , 2018, Springer International Publishing.

[174]  Risi Sebastian,et al.  Breeding a diversity of Super Mario behaviors through interactive evolution , 2016 .

[175]  Julian Togelius,et al.  Playing Atari with Six Neurons , 2018, AAMAS.

[176]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[177]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.