Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go’s algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.

[1]  Zheng Xie,et al.  AlphaGomoku: An AlphaGo-based Gomoku Artificial Intelligence using Curriculum Learning , 2018, ArXiv.

[2]  Kenneth O. Stanley,et al.  Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.

[3]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[4]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[5]  Keiron O'Shea,et al.  An Introduction to Convolutional Neural Networks , 2015, ArXiv.

[6]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[7]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[8]  Taghi M. Khoshgoftaar,et al.  Survey on categorical data for neural networks , 2020, Journal of Big Data.

[9]  Balázs Kégl,et al.  Similarity encoding for learning with dirty categorical variables , 2018, Machine Learning.

[10]  Shi-Jim Yen,et al.  Two-Stage Monte Carlo Tree Search for Connect6 , 2011, IEEE Transactions on Computational Intelligence and AI in Games.

[11]  Yiding Yang,et al.  Overcoming Catastrophic Forgetting in Graph Neural Networks , 2020, AAAI.

[12]  Kedar Potdar,et al.  A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers , 2017 .

[13]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[14]  Dharm Singh,et al.  Using Genetic Algorithm to Solve Game of Go-Moku , 2012 .

[15]  Robert E. Marks,et al.  Playing Games with Genetic Algorithms , 2002 .

[16]  Leif E. Peterson K-nearest neighbor , 2009, Scholarpedia.

[17]  Chuan Yi Tang,et al.  On the bottleneck tree alignment problems , 2010, Inf. Sci..

[18]  H. Jaap van den Herik,et al.  Proof-Number Search and Its Variants , 2008, Oppositional Concepts in Computational Intelligence.

[19]  Hyoil Lee,et al.  Creating Pro-Level AI for a Real-Time Fighting Game Using Deep Reinforcement Learning , 2021 .

[20]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[21]  Dongbin Zhao,et al.  Self-teaching adaptive dynamic programming for Gomoku , 2012, Neurocomputing.

[22]  Hector Geffner,et al.  Model-free, Model-based, and General Intelligence , 2018, IJCAI.

[23]  Petter Ögren,et al.  Behavior Trees in Robotics and AI: An Introduction , 2017, ArXiv.