Reinforcement Learning in Games

Reinforcement learning and games have a long and mutually beneficial common history. From one side, games are rich and challenging domains for testing reinforcement learning algorithms. From the other side, in several games the best computer players use reinforcement learning. The chapter begins with a selection of games and notable reinforcement learning implementations.Without any modifications, the basic reinforcement learning algorithms are rarely sufficient for high-level gameplay, so it is essential to discuss the additional ideas, ways of inserting domain knowledge, implementation decisions that are necessary for scaling up. These are reviewed in sufficient detail to understand their potentials and their limitations. The second part of the chapter lists challenges for reinforcement learning in games, together with a review of proposed solution methods. While this listing has a game-centric viewpoint, and some of the items are specific to games (like opponent modelling), a large portion of this overview can provide insight for other kinds of applications, too. In the third part we review how reinforcement learning can be useful in game development and find its way into commercial computer games. Finally, we provide pointers for more in-depth reviews of specific games and solution approaches.

[1]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[2]  Ari Shapiro,et al.  Learning a Game Strategy Using Pattern-Weights and Self-play , 2002, Computers and Games.

[3]  Ian D. Watson,et al.  Computer poker: A review , 2011, Artif. Intell..

[4]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[5]  Risto Miikkulainen,et al.  Discovering Complex Othello Strategies Through Evolutionary Neural Networks , 1995 .

[6]  Shaul Markovitch,et al.  Learning to bid in bridge , 2006, Machine Learning.

[7]  Yavar Naddaf,et al.  Game-independent AI agents for playing Atari 2600 console games , 2010 .

[8]  Bruno Scherrer,et al.  Building Controllers for Tetris , 2009, J. Int. Comput. Games Assoc..

[9]  Rémi Coulom,et al.  Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..

[10]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[11]  Stefan J. Johansson,et al.  Measuring player experience on runtime dynamic difficulty scaling in an RTS game , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[12]  Jonathan Schaeffer,et al.  Game-Tree Search with Adaptation in Stochastic Imperfect-Information Games , 2004, Computers and Games.

[13]  Gregory John Kuhlmann,et al.  Automated domain analysis and transfer learning in general game playing , 2010 .

[14]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[15]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[16]  André Wilson Brotto Furtado,et al.  Online adaptation of computer games agents: A reinforcement learning approach , 2004 .

[17]  Johannes Fürnkranz,et al.  Learning of Piece Values for Chess Variants , 2008 .

[18]  Marco Wiering Self-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning , 2010, J. Intell. Learn. Syst. Appl..

[19]  Robert Levinson,et al.  Chess Neighborhoods, Function Combination, and Reinforcement Learning , 2000, Computers and Games.

[20]  Bart Selman,et al.  On Adversarial Search Spaces and Sampling-Based Planning , 2010, ICAPS.

[21]  András Lörincz,et al.  Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[22]  Matthew E. Taylor,et al.  Abstraction and Generalization in Reinforcement Learning: A Summary and Framework , 2009, ALA.

[23]  Ulf Lorenz,et al.  Beyond Optimal Play in Two-Person-Zerosum Games , 2004, ESA.

[24]  Martin Müller Position Evaluation in Computer Go , 2002, J. Int. Comput. Games Assoc..

[25]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[26]  Bruno Bouzy,et al.  Monte-Carlo Go Developments , 2003, ACG.

[27]  Nathan R. Sturtevant,et al.  Feature Construction for Reinforcement Learning in Hearts , 2006, Computers and Games.

[28]  Richard S. Sutton,et al.  Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[29]  Risto Miikkulainen,et al.  Real-time neuroevolution in the NERO video game , 2005, IEEE Transactions on Evolutionary Computation.

[30]  András Lörincz,et al.  Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man , 2007, J. Artif. Intell. Res..

[31]  Eric O. Postma,et al.  Adaptive game AI with dynamic scripting , 2006, Machine Learning.

[32]  Michael Buro,et al.  From Simple Features to Sophisticated Evaluation Functions , 1998, Computers and Games.

[33]  Jordan B. Pollack,et al.  Why did TD-Gammon Work? , 1996, NIPS.

[34]  Stephen J. McGlinchey Learning of AI Players From Game Observation Data , 2003, GAME-ON.

[35]  H. Jaap van den Herik,et al.  Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[36]  Eric O. Postma,et al.  On-line Adaptation of Game Opponent AI with Dynamic Scripting , 2004, Int. J. Intell. Games Simul..

[37]  Paul E. Utgoff,et al.  Feature construction for game playing , 2001 .

[38]  Feng-Hsiung Hsu,et al.  Behind Deep Blue: Building the Computer that Defeated the World Chess Champion , 2002 .

[39]  Scott D. Goodwin,et al.  General Game Playing: An Overview and Open Problems , 2009, 2009 International Conference on Computing, Engineering and Information.

[40]  Duane Szafron,et al.  Learning Companion Behaviors Using Reinforcement Learning in Games , 2010, AIIDE.

[41]  Pieter Spronck,et al.  Monte-Carlo Tree Search in Settlers of Catan , 2009, ACG.

[42]  Michael Gherrity,et al.  A game-learning machine , 1993 .

[43]  Matthew L. Ginsberg,et al.  GIB: Imperfect Information in a Computationally Challenging Game , 2011, J. Artif. Intell. Res..

[44]  G Martin,et al.  Automatic Feature Construction for General Game Playing , 2008 .

[45]  H. Jaap van den Herik,et al.  Games solved: Now and in the future , 2002, Artif. Intell..

[46]  Michael Mateas,et al.  Case-Based Reasoning for Build Order in Real-Time Strategy Games , 2009, AIIDE.

[47]  Duane Szafron,et al.  Using counterfactual regret minimization to create competitive multiplayer poker agents , 2010, AAMAS 2010.

[48]  Paul E. Utgoff,et al.  Automatic Feature Generation for Problem Solving Systems , 1992, ML.

[49]  Andrew Tridgell,et al.  Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[50]  Jonathan Schaeffer,et al.  Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.

[51]  Michail G. Lagoudakis,et al.  Least-Squares Methods in Reinforcement Learning for Control , 2002, SETN.

[52]  Csaba Szepesvári,et al.  RSPSA: Enhanced Parameter Optimization in Games , 2006, ACG.

[53]  Gerald Tesauro,et al.  Monte-Carlo simulation balancing , 2009, ICML '09.

[54]  Johannes Fürnkranz,et al.  Recent Advances in Machine Learning and Game Playing , 2007 .

[55]  Donald F. Beal,et al.  Learning Piece Values Using Temporal Differences , 1997, J. Int. Comput. Games Assoc..

[56]  Gerald Tesauro,et al.  Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..

[57]  Terrence J. Sejnowski,et al.  Learning to evaluate Go positions via temporal difference methods , 2001 .

[58]  Yngvi Björnsson,et al.  CadiaPlayer: A Simulation-Based General Game Player , 2009, IEEE Transactions on Computational Intelligence and AI in Games.

[59]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[60]  Darryl Charles,et al.  Machine learning in digital games: a survey , 2008, Artificial Intelligence Review.

[61]  Joel Veness,et al.  Bootstrapping from Game Tree Search , 2009, NIPS.

[62]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[63]  Olivier Teytaud,et al.  Introduction de connaissances expertes en Bandit-Based Monte-Carlo Planning avec application au Computer-Go , 2008 .

[64]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[65]  Doina Precup,et al.  Constructive Function Approximation , 1998 .

[66]  Amine M. Boumaza On the evolution of artificial Tetris players , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[67]  David Silver,et al.  Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Achieving Master Level Play in 9 × 9 Computer Go , 2022 .

[68]  Gerald Tesauro Comments on “Co-Evolution in the Successful Learning of Backgammon Strategy” , 2004, Machine Learning.

[69]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[70]  Sandra Zilles,et al.  Models of active learning in group-structured state spaces , 2010, Inf. Comput..

[71]  Tuomas Sandholm,et al.  Lossless abstraction of imperfect information games , 2007, JACM.

[72]  Norman E. Gough,et al.  Online Learning From Observation For Interactive Computer Games , 2005 .

[73]  Ian D. Watson,et al.  Using reinforcement learning for city site selection in the turn-based strategy game Civilization IV , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[74]  John Aaron. Davidson,et al.  Opponent modeling in poker: learning and acting in a hostile and uncertain environment , 2002 .

[75]  Gabriella Kókai,et al.  Evolving a Heuristic Function for the Game of Tetris , 2004, LWA.

[76]  Luc De Raedt,et al.  Relational Reinforcement Learning , 2001, Machine Learning.

[77]  David W. Aha,et al.  Automatically Acquiring Domain Knowledge For Adaptive Game AI Using Evolutionary Learning , 2005, AAAI.

[78]  Susan L. Epstein Toward an ideal trainer , 1994, Machine Learning.

[79]  Troels Bjerre Lund,et al.  Potential-Aware Automated Abstraction of Sequential Games, and Holistic Equilibrium Analysis of Texas Hold'em Poker , 2007, AAAI.

[80]  Olivier Teytaud,et al.  Adding Expert Knowledge and Exploration in Monte-Carlo Tree Search , 2009, ACG.

[81]  Frank Dignum,et al.  Adaptive reinforcement learning agents in RTS games , 2008 .

[82]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[83]  Shaul Markovitch,et al.  Learning to Play Chess Selectively by Acquiring Move Patterns , 1998, J. Int. Comput. Games Assoc..

[84]  Jonathan Schaeffer,et al.  The games computers (and people) play , 2000, Adv. Comput..

[85]  Dimitrios Kalles,et al.  On verifying game designs and playing strategies using reinforcement learning , 2001, SAC.

[86]  Michael Buro,et al.  RTS Games as Test-Bed for Real-Time AI Research , 2003 .

[87]  David W. Aha,et al.  Learning to Win: Case-Based Plan Selection in a Real-Time Strategy Game , 2005, Künstliche Intell..

[88]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[89]  Jacek Mandziuk,et al.  Knowledge-Free and Learning-Based Methods in Intelligent Game Playing , 2010, Studies in Computational Intelligence.

[90]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[91]  Sebastian Thrun,et al.  Learning to Play the Game of Chess , 1994, NIPS.