Algorithms for computing strategies in two-player simultaneous move games

Abstract Simultaneous move games model discrete, multistage interactions where at each stage players simultaneously choose their actions. At each stage, a player does not know what action the other player will take, but otherwise knows the full state of the game. This formalism has been used to express games in general game playing and can also model many discrete approximations of real-world scenarios. In this paper, we describe both novel and existing algorithms that compute strategies for the class of two-player zero-sum simultaneous move games. The algorithms include exact backward induction methods with efficient pruning, as well as Monte Carlo sampling algorithms. We evaluate the algorithms in two different settings: the offline case, where computational resources are abundant and closely approximating the optimal strategy is a priority, and the online search case, where computational resources are limited and acting quickly is necessary. We perform a thorough experimental evaluation on six substantially different games for both settings. For the exact algorithms, the results show that our pruning techniques for backward induction dramatically improve the computation time required by the previous exact algorithms. For the sampling algorithms, the results provide unique insights into their performance and identify favorable settings and domains for different sampling algorithms.

[1]  Mark H. M. Winands,et al.  Real-Time Monte Carlo Tree Search in Ms Pac-Man , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[2]  Richard J. Lorentz Amazons Discover Monte-Carlo , 2008, Computers and Games.

[3]  Hilmar Finnsson,et al.  CADIA-Player : a general game playing agent , 2007 .

[4]  Tuomas Sandholm,et al.  The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[5]  Alexander Reinefeld,et al.  An Improvement to the Scout Tree Search Algorithm , 1983, J. Int. Comput. Games Assoc..

[6]  Michèle Sebag,et al.  The grand challenge of computer Go , 2012, Commun. ACM.

[7]  Bart Selman,et al.  Trade-Offs in Sampling-Based Adversarial Planning , 2011, ICAPS.

[8]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[9]  Kevin Waugh,et al.  Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[10]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[11]  Michail G. Lagoudakis,et al.  Value Function Approximation in Zero-Sum Markov Games , 2002, UAI.

[12]  Simon M. Lucas,et al.  A UCT agent for Tron: Initial investigations , 2010, Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games.

[13]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[14]  S. Ross GOOFSPIEL -- THE GAME OF PURE STRATEGY , 1971 .

[15]  Geoffrey J. Gordon No-regret Algorithms for Online Convex Programs , 2006, NIPS.

[16]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[17]  L. Buşoniu,et al.  A comprehensive survey of multi-agent reinforcement learning , 2011 .

[18]  Mark H. M. Winands,et al.  Monte Carlo Tree Search in Lines of Action , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[19]  Marc Lanctot,et al.  Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling , 2011, J. Artif. Intell. Res..

[20]  Robert Givan,et al.  Sampling Techniques for Markov Games Approximation Results on Sampling Techniques for Zero-sum , Discounted Markov Games , 2007 .

[21]  Philip Hingston,et al.  Using Monte Carlo Tree Search for replanning in a multistage simultaneous game , 2012, 2012 IEEE Congress on Evolutionary Computation.

[22]  Peter Bro Miltersen,et al.  On Range of Skill , 2008, AAAI.

[23]  Michael H. Bowling,et al.  Monte carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games , 2013 .

[24]  Michael Buro,et al.  Fast Heuristic Search for RTS Game Combat Scenarios , 2012, AIIDE.

[25]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[26]  Mark H. M. Winands,et al.  Monte Carlo Tree Search variants for simultaneous move games , 2014, 2014 IEEE Conference on Computational Intelligence and Games.

[27]  Paolo Ciancarini,et al.  Monte Carlo tree search in Kriegspiel , 2010, Artif. Intell..

[28]  Bruno Scherrer,et al.  Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.

[29]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[30]  Neil Burch,et al.  Heads-up limit hold’em poker is solved , 2015, Science.

[31]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[32]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[33]  Lorenz Nett,et al.  ERMES-auction in Germany. First simultaneous multiple-round auction in the European telecommunications market , 1997 .

[34]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[35]  Branislav Bosanský,et al.  An Exact Double-Oracle Algorithm for Zero-Sum Extensive-Form Games with Imperfect Information , 2014, J. Artif. Intell. Res..

[36]  Michael Buro,et al.  Alpha-Beta Pruning for Games with Simultaneous Moves , 2012, AAAI.

[37]  Michael R. Genesereth,et al.  General Game Playing: Overview of the AAAI Competition , 2005, AI Mag..

[38]  Mark H. M. Winands,et al.  Monte Carlo Tree Search for Simultaneous Move Games: A Case Study in the Game of Tron , 2013 .

[39]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[40]  Branislav Bosanský,et al.  Using Double-Oracle Method and Serialized Alpha-Beta Search for Pruning in Simultaneous Move Games , 2013, IJCAI.

[41]  Santiago Ontañón,et al.  A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[42]  Abdallah Saffidine Solving Games and All That , 2013 .

[43]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[44]  Duane Szafron,et al.  Generalized Sampling and Variance in Counterfactual Regret Minimization , 2012, AAAI.

[45]  Martin V. Butz,et al.  Search in Real-Time Video Games , 2013, Artificial and Computational Intelligence in Games.

[46]  Damien Ernst,et al.  Comparison of different selection strategies in Monte-Carlo Tree Search for the game of Tron , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[47]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[48]  Michael H. Bowling,et al.  Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games , 2015, AAMAS.

[49]  Peter Vrancx,et al.  Game Theory and Multi-agent Reinforcement Learning , 2012, Reinforcement Learning.

[50]  Olivier Teytaud,et al.  Upper Confidence Trees with Short Term Partial Information , 2011, EvoApplications.

[51]  Branislav Bosanský,et al.  Convergence of Monte Carlo Tree Search in Simultaneous Move Games , 2013, NIPS.

[52]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[53]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[54]  Alan Fern,et al.  UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.

[55]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[56]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[57]  Peter I. Cowling,et al.  Information Set Monte Carlo Tree Search , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[59]  Manuela M. Veloso,et al.  Convergence of Gradient Dynamics with a Variable Learning Rate , 2001, ICML.

[60]  M. Winands,et al.  Monte-Carlo Tree Search for the Simultaneous Move Game Tron , 2012 .

[61]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[62]  Ruck Thawonmas,et al.  Monte Carlo Tree Search for Collaboration Control of Ghosts in Ms. Pac-Man , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[63]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[64]  Michael H. Bowling,et al.  A New Algorithm for Generating Equilibria in Massive Zero-Sum Games , 2007, AAAI.

[65]  Hilmar Finnsson,et al.  Simulation-Based General Game Playing , 2012 .

[66]  D. Koller,et al.  Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[67]  Simon M. Lucas,et al.  Solving the Physical Traveling Salesman Problem: Tree Search and Macro Actions , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[68]  Michael Buro,et al.  Solving the Oshi-Zumo Game , 2003, ACG.

[69]  Jonathan Schaeffer,et al.  CHINOOK: The World Man-Machine Checkers Champion , 1996, AI Mag..

[70]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[71]  Mark H. M. Winands,et al.  Monte Carlo Tree Search in Simultaneous Move Games with Applications to Goofspiel , 2013, CGW@IJCAI.

[72]  Branislav Bosanský,et al.  Adversarial search with procedural knowledge heuristic , 2009, AAMAS.

[73]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[74]  Nathan R. Sturtevant,et al.  Monte Carlo Tree Search with heuristic evaluations using implicit minimax backups , 2014, 2014 IEEE Conference on Computational Intelligence and Games.

[75]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[76]  J. Schaeffer,et al.  Comparing UCT versus CFR in Simultaneous Games , 2009 .

[77]  Laurent Bartholdi,et al.  Computer Solution to the Game of Pure Strategy , 2012, Games.

[78]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[79]  Michael Buro,et al.  Adversarial Planning Through Strategy Simulation , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[80]  Richard Lorentz,et al.  Programming Breakthrough , 2013, Computers and Games.

[81]  Dana S. Nau,et al.  An Analysis of Forward Pruning , 1994, AAAI.

[82]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[83]  David Silver,et al.  Combining Online and Offline Learning in UCT , 2007 .

[84]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[85]  Michael Buro,et al.  Heuristic Search Applied to Abstract Combat Games , 2005, Canadian Conference on AI.

[86]  Christopher Archibald,et al.  Monte Carlo *-Minimax Search , 2013, IJCAI.

[87]  David Silver,et al.  Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..