Development of a reinforcement learning system to play Othello

The purpose of the reinforcement learning system is to learn an optimal policy in general. On the other hand, in two-player games such as Othello, it is important to acquire a penalty-avoiding policy that can avoid losing the game. We know the penalty avoiding rational policy making algorithm (PARP) to learn the policy. If we apply PARP to large-scale problems, we are confronted with an explosion of the number of states. In this article, we focus on Othello, a game that has huge state spaces. We introduce several ideas and heuristics to adapt PARP to Othello. We show that our learning player beats the well-known Othello program, KITTY.

[1]  Shigenobu Kobayashi,et al.  Rationality of Reward Sharing in Multi-agent Reinforcement Learning , 1999, PRIMA.

[2]  Shigenobu Kobayashi,et al.  Reinforcement learning for penalty avoiding policy making , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[3]  Shigenobu Kobayashi,et al.  k-Certainty Exploration Method: An Action Selector to Identify the Environment in Reinforcement Learning , 1997, Artif. Intell..

[4]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.