Minimax Strikes Back

Deep Reinforcement Learning (DRL) reaches a superhuman level of play in many complete information games. The state of the art search algorithm used in combination with DRL is Monte Carlo Tree Search (MCTS). We take another approach to DRL using a Minimax algorithm instead of MCTS and learning only the evaluation of states, not the policy. We show that for multiple games it is competitive with the state of the art DRL for the learning performances and for the con-

[1]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[2]  Richard E. Korf,et al.  Best-First Minimax Search , 1996, Artif. Intell..

[3]  David J. Wu,et al.  Accelerating Self-Play Learning in Go , 2019, ArXiv.

[4]  Quentin Cohen-Solal Learning to Play Two-Player Perfect-Information Games without Knowledge , 2020, ArXiv.

[5]  Yuandong Tian,et al.  ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero , 2019, ICML.

[6]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7]  William Fink An Enhancement to the Iterative, Alpha-Beta, Minimax Search Procedure , 1982, J. Int. Comput. Games Assoc..

[8]  Donald E. Knuth,et al.  The Solution for the Branching Factor of the Alpha-Beta Pruning Algorithm , 1981, ICALP.

[9]  David Silver,et al.  Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[10]  H. Jaap van den Herik,et al.  Parallel Monte-Carlo Tree Search , 2008, Computers and Games.

[11]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[12]  Tristan Cazenave,et al.  Utilisation de la recherche arborescente Monte-Carlo au Hex , 2009, Rev. d'Intelligence Artif..

[13]  Tristan Cazenave,et al.  Playout policy adaptation with move features , 2016, Theor. Comput. Sci..

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[16]  Tristan Cazenave,et al.  Generalized Rapid Action Value Estimation , 2015, IJCAI.

[17]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[18]  Dennis J. N. J. Soemers,et al.  A Practical Introduction to the Ludii General Game System , 2019, ACG.

[19]  Sylvain Gelly,et al.  Modifications of UCT and sequence-like simulations for Monte-Carlo Go , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[20]  Olivier Teytaud,et al.  Polygames: Improved Zero Learning , 2020, J. Int. Comput. Games Assoc..

[21]  Tony Marsland,et al.  COMPUTER CHESS METHODS , 1990 .

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[24]  Shih-Chieh Huang,et al.  MoHex 2.0: A Pattern-Based MCTS Hex Player , 2013, Computers and Games.

[25]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.