Nested Rollout Policy Adaptation with Selective Policies

Monte Carlo Tree Search (MCTS) is a general search algorithm that has improved the state of the art for multiple games and optimization problems. Nested Rollout Policy Adaptation (NRPA) is an MCTS variant that has found record-breaking solutions for puzzles and optimization problems. It learns a playout policy online that dynamically adapts the playouts to the problem at hand. We propose to enhance NRPA using more selectivity in the playouts. The idea is applied to three different problems: Bus regulation, SameGame and Weak Schur numbers. We improve on standard NRPA for all three problems.

[1]  Fabien Teytaud,et al.  Optimization of the Nested Monte-Carlo Algorithm on the Traveling Salesman Problem with Time Windows , 2011, EvoApplications.

[2]  Marco Platzner,et al.  Adaptive Playouts in Monte-Carlo Tree Search with Policy-Gradient Reinforcement Learning , 2015, ACG.

[3]  Flavien Balbo,et al.  Monte-Carlo Bus Regulation , 2009 .

[4]  Bruno Bouzy An Experimental Investigation on the Pancake Problem , 2015, CGW/GIGA@IJCAI.

[5]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[6]  Jos W. H. M. Uiterwijk,et al.  Single-player Monte-Carlo tree search for SameGame , 2012, Knowl. Based Syst..

[7]  Mark H. M. Winands,et al.  Nested Monte-Carlo Tree Search for Online Planning in Large MDPs , 2012, ECAI.

[8]  Shih-Chieh Huang,et al.  MoHex 2.0: A Pattern-Based MCTS Hex Player , 2013, Computers and Games.

[9]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[10]  Stefan Edelkamp,et al.  Algorithm and knowledge engineering for the TSPTW problem , 2013, 2013 IEEE Symposium on Computational Intelligence in Scheduling (CISched).

[11]  Tristan Cazenave,et al.  Playout policy adaptation with move features , 2016, Theor. Comput. Sci..

[12]  Cyril Fonlupt,et al.  Investigating Monte-Carlo Methods on the Weak Schur Problem , 2013, EvoCOP.

[13]  Stefan Edelkamp,et al.  Monte-Carlo Tree Search for 3D Packing with Object Orientation , 2014, KI.

[14]  Nicolas Jouandeau,et al.  Parallel Nested Monte-Carlo search , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[15]  Simon M. Lucas,et al.  Fast Evolutionary Adaptation for Monte Carlo Tree Search , 2014, EvoApplications.

[16]  Bruno Bouzy An Abstract Procedure to Compute Weak Schur Number Lower Bounds , 2015 .

[17]  D. M. Breuker Memory versus search in games , 1998 .

[18]  Stefan Edelkamp,et al.  Monte-Carlo Tree Search for the Multiple Sequence Alignment Problem , 2015, SOCS.

[19]  Christopher D. Rosin,et al.  Nested Rollout Policy Adaptation for Monte Carlo Tree Search , 2011, IJCAI.

[20]  Tristan Cazenave,et al.  Nested Monte-Carlo Search , 2009, IJCAI.

[21]  David Kinny A New Approach to the Snake-In-The-Box Problem , 2012, ECAI.

[22]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[23]  Stefan Edelkamp,et al.  Solving Physical Traveling Salesman Problems with policy adaptation , 2014, 2014 IEEE Conference on Computational Intelligence and Games.

[24]  Mark H. M. Winands,et al.  Beam Monte-Carlo Tree Search , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[25]  Simon Klein Attacking SameGame using Monte-Carlo Tree Search : Using randomness as guidance in puzzles , 2015 .

[26]  Fabien Teytaud,et al.  Beam Nested Rollout Policy Adaptation , 2012, ECAI 2012.

[27]  Otthein Herzog,et al.  Monte-Carlo Tree Search for Logistics , 2016 .

[28]  Grzegorz Kondrak,et al.  Solving Substitution Ciphers with Combined Language Models , 2014, COLING.

[29]  Bruno Bouzy Monte-Carlo Fork Search for Cooperative Path-Finding , 2013, CGW@IJCAI.

[30]  Rémi Coulom,et al.  Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..

[31]  Fabien Teytaud,et al.  Application of the Nested Rollout Policy Adaptation Algorithm to the Traveling Salesman Problem with Time Windows , 2012, LION.

[32]  Albert L. Zobrist,et al.  A New Hashing Method with Application for Game Playing , 1990 .

[33]  H. Jaap van den Herik,et al.  Single-Player Monte-Carlo Tree Search , 2008, Computers and Games.