Can Monte-Carlo Tree Search learn to sacrifice?

One of the most basic activities performed by an intelligent agent is deciding what to do next. The decision is usually about selecting the move with the highest expectation, or exploring new scenarios. Monte-Carlo Tree Search (MCTS), which was developed as a game playing agent, deals with this exploration–exploitation ‘dilemma’ using a multi-armed bandits strategy. The success of MCTS in a wide range of problems, such as combinatorial optimisation, reinforcement learning, and games, is due to its ability to rapidly evaluate problem states without requiring domain-specific knowledge. However, it has been acknowledged that the trade-off between exploration and exploitation is crucial for the performance of the algorithm, and affects the efficiency of the agent in learning deceptive states. One type of deception is states that give immediate rewards, but lead to a suboptimal solution in the long run. These states are known as trap states, and have been thoroughly investigated in previous research. In this work, we study the opposite of trap states, known as sacrifice states, which are deceptive moves that result in a local loss but are globally optimal, and investigate the efficiency of MCTS enhancements in identifying this type of moves.

[1]  Pieter Spronck,et al.  Monte-Carlo Tree Search in Settlers of Catan , 2009, ACG.

[2]  Yngvi Björnsson,et al.  CadiaPlayer: A Simulation-Based General Game Player , 2009, IEEE Transactions on Computational Intelligence and AI in Games.

[3]  Fabien Teytaud,et al.  Optimization of the Nested Monte-Carlo Algorithm on the Traveling Salesman Problem with Time Windows , 2011, EvoApplications.

[4]  Mark H. M. Winands,et al.  Monte Carlo Tree Search in Lines of Action , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[5]  Pieter Spronck,et al.  Monte-Carlo Tree Search: A New Framework for Game AI , 2008, AIIDE.

[6]  Jan Willemson,et al.  Improved Monte-Carlo Search , 2006 .

[7]  Michael L. Littman,et al.  Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.

[8]  Jie Chen,et al.  Problem difficulty analysis for particle swarm optimization: deception and modality , 2009, GEC '09.

[9]  Zhiqing Liu,et al.  Backpropagation Modification in Monte-Carlo Game Tree Search , 2009, 2009 Third International Symposium on Intelligent Information Technology Application.

[10]  Makoto Yokoo,et al.  Real-Time Solving of Quantified CSPs Based on Monte-Carlo Game Tree Search , 2011, IJCAI.

[11]  Richard J. Lorentz Amazons Discover Monte-Carlo , 2008, Computers and Games.

[12]  Russell Greiner,et al.  Finding optimal satisficing strategies for and-or trees , 2006, Artif. Intell..

[13]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[14]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[15]  V. T. Rajan,et al.  Bayesian Inference in Monte-Carlo Tree Search , 2010, UAI.

[16]  David P. Helmbold,et al.  All-Moves-As-First Heuristics in Monte-Carlo Go , 2009, IC-AI.

[17]  Ryan B. Hayward,et al.  Monte Carlo Tree Search in Hex , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[18]  Flavien Balbo,et al.  Using a monte-carlo approach for bus regulation , 2009, 2009 12th International IEEE Conference on Intelligent Transportation Systems.

[19]  Martin Müller,et al.  Fuego—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[20]  Bart Selman,et al.  Understanding Sampling Style Adversarial Search Methods , 2012, UAI.

[21]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[22]  Samy Bengio,et al.  The Vehicle Routing Problem with Time Windows Part II: Genetic Search , 1996, INFORMS J. Comput..

[23]  Robert Levinson,et al.  GENERAL GAME‐PLAYING AND REINFORCEMENT LEARNING , 1995, Comput. Intell..

[24]  Bruno Bouzy,et al.  Associating domain-dependent knowledge and Monte Carlo approaches within a Go program , 2005, Inf. Sci..

[25]  David E. Goldberg,et al.  Genetic Algorithm Difficulty and the Modality of Fitness Landscapes , 1994, FOGA.

[26]  Bart Selman,et al.  On Adversarial Search Spaces and Sampling-Based Planning , 2010, ICAPS.

[27]  Kyung-Joong Kim,et al.  MCTS with influence map for general video game playing , 2015, 2015 IEEE Conference on Computational Intelligence and Games (CIG).

[28]  Julian Togelius,et al.  Towards Procedural Strategy Game Generation: Evolving Complementary Unit Types , 2011, EvoApplications.

[29]  Julian Togelius,et al.  Evolving Game-Specific UCB Alternatives for General Video Game Playing , 2017, EvoApplications.

[30]  David Silver,et al.  Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[31]  Bruno Bouzy,et al.  Associating Shallow and Selective Global Tree Search with Monte Carlo for 9*9 Go , 2004, Computers and Games.

[32]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[33]  B. Pell A STRATEGIC METAGAME PLAYER FOR GENERAL CHESS‐LIKE GAMES , 1994, Comput. Intell..

[34]  Julian Togelius,et al.  Investigating MCTS modifications in general video game playing , 2015, 2015 IEEE Conference on Computational Intelligence and Games (CIG).

[35]  Daisuke Takahashi,et al.  A Shogi Program Based on Monte-Carlo Tree Search , 2010, J. Int. Comput. Games Assoc..

[36]  Yoshiyuki Kotani,et al.  Combining final score with winning percentage by sigmoid function in Monte-Carlo simulations , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[37]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[38]  Ryan B. Hayward,et al.  MOHEX Wins Hex Tournament , 2012, J. Int. Comput. Games Assoc..

[39]  Olivier Teytaud,et al.  On the huge benefit of decisive moves in Monte-Carlo Tree Search algorithms , 2010, Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games.

[40]  Marco Dorigo,et al.  Deception in Ant Colony Optimization , 2004, ANTS Workshop.

[41]  Ching-Tsorng Tsai,et al.  AN EVOLUTIONARY STRATEGY FOR A COMPUTER TEAM GAME , 2011, Comput. Intell..