Beam Nested Rollout Policy Adaptation

The Nested Rollout Policy Adaptation algorithm is a tree search algorithm known to be efficient on combinatorial problems. However, one problem of this algorithm is that it can converge to a local optimum and get stuck in it. We propose a modification which limits this behavior and we experiment it on two combinatorial problems for which the Nested Rollout Policy Adaption is known to be good at.

[1]  Wiebe van der Hoek,et al.  Modal logic for games and information , 2007, Handbook of Modal Logic.

[2]  Christopher D. Rosin,et al.  Nested Rollout Policy Adaptation for Monte Carlo Tree Search , 2011, IJCAI.

[3]  Simon M. Lucas,et al.  A UCT agent for Tron: Initial investigations , 2010, Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games.

[4]  Michel Gendreau,et al.  A Generalized Insertion Heuristic for the Traveling Salesman Problem with Time Windows , 1998, Oper. Res..

[5]  Valentin Goranko,et al.  Complete axiomatization and decidability of Alternating-time temporal logic , 2006, Theor. Comput. Sci..

[6]  David Silver,et al.  Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Achieving Master Level Play in 9 × 9 Computer Go , 2022 .

[7]  H. Jaap van den Herik,et al.  Proof-Number Search and Its Variants , 2008, Oppositional Concepts in Computational Intelligence.

[8]  Jacques Desrosiers,et al.  An Optimal Algorithm for the Traveling Salesman Problem with Time Windows , 1991, Oper. Res..

[9]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[10]  Andrea Lodi,et al.  A Hybrid Exact Algorithm for the TSPTW , 2002, INFORMS J. Comput..

[11]  Michael R. Genesereth,et al.  General Game Playing: Overview of the AAAI Competition , 2005, AI Mag..

[12]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[13]  H. Jaap van den Herik,et al.  Single-Player Monte-Carlo Tree Search , 2008, Computers and Games.

[14]  Richard J. Lorentz Amazons Discover Monte-Carlo , 2008, Computers and Games.

[15]  H. Jaap van den Herik,et al.  Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[16]  Stephan Schiffel,et al.  A Multiagent Semantics for the Game Description Language , 2009, ICAART.

[17]  Matthew L. Ginsberg,et al.  GIB: Steps Toward an Expert-Level Bridge-Playing Program , 1999, IJCAI.

[18]  Tristan Cazenave,et al.  Nested Monte-Carlo Search , 2009, IJCAI.

[19]  Iyatiti Mokube,et al.  Honeypots: concepts, approaches, and challenges , 2007, ACM-SE 45.

[20]  Tetsuro Tanaka,et al.  Dual Lambda Search and Shogi Endgames , 2006, ACG.

[21]  Samy Bengio,et al.  The Vehicle Routing Problem with Time Windows Part II: Genetic Search , 1996, INFORMS J. Comput..

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  Manuel López-Ibáñez,et al.  Beam-ACO for the travelling salesman problem with time windows , 2010, Comput. Oper. Res..

[24]  L. Kurzen Complexity in interaction , 2011 .

[25]  Fabien Teytaud,et al.  Application of the Nested Rollout Policy Adaptation Algorithm to the Traveling Salesman Problem with Time Windows , 2012, LION.

[26]  Paolo Toth,et al.  State-space relaxation procedures for the computation of bounds to routing problems , 1981, Networks.

[27]  Fabien Teytaud,et al.  Optimization of the Nested Monte-Carlo Algorithm on the Traveling Salesman Problem with Time Windows , 2011, EvoApplications.

[28]  Marius M. Solomon,et al.  Algorithms for the Vehicle Routing and Scheduling Problems with Time Window Constraints , 1987, Oper. Res..

[29]  Michael Wooldridge,et al.  Logic for Automated Mechanism Design - A Progress Report , 2007, AAAI.

[30]  Michael Wooldridge,et al.  Cooperation, Knowledge, and Time: Alternating-time Temporal Epistemic Logic and its Applications , 2003, Stud Logica.

[31]  Takashi Chikayama,et al.  Game-tree Search Algorithm based on Realization Probability , 2002, J. Int. Comput. Games Assoc..

[32]  Tristan Cazenave Multi-player Go , 2008, Computers and Games.

[33]  Richard E. Korf,et al.  On Pruning Techniques for Multi-Player Games , 2000, AAAI/IAAI.

[34]  Donald E. Knuth,et al.  The Solution for the Branching Factor of the Alpha-Beta Pruning Algorithm , 1981, ICALP.

[35]  Pieter Spronck,et al.  Monte-Carlo Tree Search: A New Framework for Game AI , 2008, AIIDE.

[36]  Jonathan Schaeffer,et al.  A Gamut of Games , 2001, AI Mag..

[37]  Mark H. M. Winands,et al.  Paranoid Proof-Number Search , 2010, Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games.

[38]  Thomas A. Henzinger,et al.  Alternating-time temporal logic , 2002, JACM.

[39]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[40]  Mark H. M. Winands,et al.  Monte-Carlo Tree Search Solver , 2008, Computers and Games.

[41]  Thomas Thomsen Lambda-Search in Game Trees - with Application to Go , 2000, J. Int. Comput. Games Assoc..

[42]  Tristan Cazenave,et al.  Score Bounded Monte-Carlo Tree Search , 2010, Computers and Games.

[43]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[44]  Mark H. M. Winands,et al.  Enhancements for Multi-Player Monte-Carlo Tree Search , 2010, Computers and Games.

[45]  Akihiro Kishimoto,et al.  A solution to the GHI problem for depth-first proof-number search , 2005, Inf. Sci..

[46]  H. Jaap van den Herik,et al.  Proof-Number Search , 1994, Artif. Intell..

[47]  Erik D. Demaine,et al.  Morpion Solitaire , 2005, Theory of Computing Systems.

[48]  Salem Benferhat,et al.  New Schemes for Anomaly Score Aggregation and Thresholding , 2008, SECRYPT.

[49]  Jonathan Schaeffer,et al.  Unifying single-agent and two-player search , 2000, Inf. Sci..

[50]  Abdallah Saffidine,et al.  UCD: Upper Confidence Bound for Rooted Directed Acyclic Graphs , 2010 .

[51]  Michel Gendreau,et al.  An Exact Constraint Logic Programming Algorithm for the Traveling Salesman Problem with Time Windows , 1998, Transp. Sci..

[52]  Hans J. Berliner,et al.  The B* Tree Search Algorithm: A Best-First Proof Procedure , 1979, Artif. Intell..

[53]  Mark H. M. Winands,et al.  Best Reply Search for Multiplayer Games , 2011, IEEE Transactions on Computational Intelligence and AI in Games.

[54]  Mark H. M. Winands,et al.  Monte Carlo Tree Search in Lines of Action , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[55]  Ryan B. Hayward,et al.  Monte Carlo Tree Search in Hex , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[56]  Nathan R. Sturtevant,et al.  An Analysis of UCT in Multi-Player Games , 2008, J. Int. Comput. Games Assoc..