Final Adaptation Reinforcement Learning for N-Player Games

This paper covers n-tuple-based reinforcement learning (RL) algorithms for games. We present new algorithms for TD-, SARSAand Q-learning which work seamlessly on various games with arbitrary number of players. This is achieved by taking a player-centered view where each player propagates his/her rewards back to previous rounds. We add a new element called Final Adaptation RL (FARL) to all these algorithms. Our main contribution is that FARL is a vitally important ingredient to achieve success with the player-centered view in various games. We report results on seven board games with 1, 2 and 3 players, including Othello, ConnectFour and Hex. In most cases it is found that FARL is important to learn a near-perfect playing strategy. All algorithms are available in the GBG framework on GitHub.

[1]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2]  Richard E. Korf Multi-Player Alpha-Beta Pruning , 1991, Artif. Intell..

[3]  Wolfgang Konen,et al.  Temporal difference learning with eligibility traces for the game connect four , 2014, 2014 IEEE Conference on Computational Intelligence and Games.

[4]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[5]  Wolfgang Konen,et al.  Reinforcement Learning for Board Games: The Temporal Dierence Algorithm , 2015 .

[6]  Wolfgang Konen General Board Game Playing for Education and Research in Generic AI Game Learning , 2019, 2019 IEEE Conference on Games (CoG).

[7]  Wojciech Jaskowski,et al.  Temporal difference learning of N-tuple networks for the game 2048 , 2014, 2014 IEEE Conference on Computational Intelligence and Games.

[8]  Donald F. Beal,et al.  Temporal Coherence and Prediction Decay in TD Learning , 1999, IJCAI.

[9]  Wolfgang Konen,et al.  Online Adaptable Learning Rates for the Game Connect-4 , 2016, IEEE Transactions on Computational Intelligence and AI in Games.

[10]  Dennis J. N. J. Soemers,et al.  Ludii - The Ludemic General Game System , 2019, ECAI.

[11]  Wojciech Jaśkowski,et al.  Mastering 2048 With Delayed Temporal Coherence Learning, Multistage Weight Promotion, Redundant Encoding, and Carousel Shaping , 2016, IEEE Transactions on Games.

[12]  I-Chen Wu,et al.  An Agent for EinStein Würfelt Nicht! Using N-Tuple Networks , 2017, 2017 Conference on Technologies and Applications of Artificial Intelligence (TAAI).

[13]  Wojciech Jaskowski,et al.  High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: a Case Study in SZ-Tetris , 2015, GECCO.

[14]  Simon M. Lucas Learning to Play Othello with N-Tuple Systems , 2008 .

[15]  Marco Wiering,et al.  Reinforcement learning in the game of Othello: Learning against a fixed opponent and learning from self-play , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).