The Parallelization of Monte-Carlo Planning - Parallelization of MC-Planning

Since their impressive successes in various areas of large- scale parallelization, recent techniques like UCT and other Monte-Carlo planning variants (Kocsis and Szepesvari, 2006a) have been extensively studied (Coquelin and Munos, 2007; Wang and Gelly, 2007). We here propose and compare various forms of parallelization of bandit-based tree-search, in particular for our computer- go algorithm XYZ.

[1]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[2]  Tristan Cazenave,et al.  Combining Tactical Search and Monte-Carlo in the Game of Go , 2005, CIG.

[3]  J. Banks,et al.  Denumerable-Armed Bandits , 1992 .

[4]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[5]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[6]  Dimitri P. Bertsekas,et al.  Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.

[7]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[8]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[9]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[10]  Sylvain Gelly,et al.  Modifications of UCT and sequence-like simulations for Monte-Carlo Go , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[11]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[12]  Bernd Brügmann Max-Planck Monte Carlo Go , 1993 .

[13]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[14]  Csaba Szepesvari,et al.  Reduced-Variance Payoff Estimation in Adversarial Bandit Problems , 2005 .

[15]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[16]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[17]  Thomas P. Hayes,et al.  Robbing the bandit: less regret in online geometric optimization against an adaptive adversary , 2006, SODA '06.

[18]  Robert W. Chen,et al.  Bandit problems with infinitely many arms , 1997 .

[19]  Rémi Coulom,et al.  Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..