论文信息 - Effect of look-ahead search depth in learning position evaluation functions for Othello using -greedy exploration

Effect of look-ahead search depth in learning position evaluation functions for Othello using -greedy exploration

This paper studies the effect of varying the depth of look-ahead for heuristic search in temporal difference (TD) learning and game playing. The acquisition position evaluation functions for the game of Othello is studied. The paper provides important insights into the strengths and weaknesses of using different search depths during learning when epsi-greedy exploration is applied. The main findings are that contrary to popular belief, for Othello, better playing strategies are found when TD learning is applied with lower look-ahead search depths

Thomas Philip Runarsson | Egill Orn Jonsson

[1] Simon M. Lucas,et al. Temporal Difference Learning Versus Co-Evolution for Acquiring Othello Position Evaluation , 2006, 2006 IEEE Symposium on Computational Intelligence and Games.

[2] Sanjoy Mahajan,et al. A Pattern Classification Approach to Evaluation Function Learning , 1988, Artif. Intell..

[3] Kai-Fu Lee,et al. The Development of a World Class Othello Program , 1990, Artif. Intell..

[4] Michael Buro,et al. ProbCut: An Effective Selective Extension of the α-β Algorithm , 1995, J. Int. Comput. Games Assoc..

[5] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[6] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[7] Michael Buro,et al. ProbCut: An Effective Selective Extension of the alphabeta Algorithm , 1995 .

[8] Shin Ishii,et al. Strategy Acquisition for the Game "Othello" Based on Reinforcement Learning , 1999, ICONIP.