论文信息 - Comparison Training for Computer Chinese Chess

Comparison Training for Computer Chinese Chess

This paper describes the application of modified comparison training for automatic feature weight tuning. The final objective is to improve the evaluation functions used in Chinese chess programs. First, we apply n-tuple networks to extract features. N-tuple networks require very little expert knowledge through its large numbers of features, while simultaneously allowing easy access. Second, we propose a modified comparison training into which tapered eval is incorporated. Experiments show that with the same features and the same Chinese chess program, the automatically tuned feature weights achieved a win rate of 86.58% against the hand-tuned features. The above trained version was then improved by adding additional features, most importantly n-tuple features. This improved version achieved a win rate of 81.65% against the trained version without additional features.

[1] Tomoyuki Kaneko,et al. Large-Scale Optimization for Evaluation Functions with Minimax Search , 2014, J. Artif. Intell. Res..

[2] Chun Ye,et al. Selective Extensions in Game-Tree Search † , 2013 .

[3] I-Chen Wu,et al. Chimo wins Chinese chess tournament , 2018, J. Int. Comput. Games Assoc..

[4] Tomoyuki Kaneko,et al. The Global Landscape of Objective Functions for the Optimization of Shogi Piece Values with a Game-Tree Search , 2011, ACG.

[5] I-Chen Wu,et al. Job-Level Alpha-Beta Search , 2015, IEEE Transactions on Computational Intelligence and AI in Games.

[6] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[7] Wojciech Jaskowski,et al. Temporal difference learning of N-tuple networks for the game 2048 , 2014, 2014 IEEE Conference on Computational Intelligence and Games.

[8] Michael Buro,et al. From Simple Features to Sophisticated Evaluation Functions , 1998, Computers and Games.

[9] Gerald Tesauro,et al. Connectionist Learning of Expert Preferences by Comparison Training , 1988, NIPS.

[10] Shi-Jim Yen,et al. SHIGA Wins Chinese Chess Tournament , 2012, J. Int. Comput. Games Assoc..

[11] Gerald Tesauro,et al. Neurogammon: a neural-network backgammon program , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[12] Murray Campbell,et al. Deep Blue , 2002, Artif. Intell..

[13] Shi-Jim Yen,et al. Computer Chinese Chess , 2004, J. Int. Comput. Games Assoc..

[14] Hiroyuki Iida,et al. Computer shogi , 2002, Artif. Intell..

[15] L. V. Allis,et al. Searching for solutions in games and artificial intelligence , 1994 .

[16] Gerald Tesauro,et al. Comparison training of chess evaluation functions , 2001 .

[17] Donald F. Beal,et al. A Generalised Quiescence Search Algorithm , 1990, Artif. Intell..

[18] BaxterJonathan,et al. Learning to Play Chess Using Temporal Differences , 2000 .

[19] Tomoyuki Kaneko,et al. Analysis of Evaluation-Function Learning by Comparison of Sibling Nodes , 2011, ACG.

[20] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[21] Donald E. Knuth,et al. The Solution for the Branching Factor of the Alpha-Beta Pruning Algorithm , 1981, ICALP.

[22] T. Anthony Marsland,et al. Experiments in Forward Pruning with Limited Extensions , 1992, J. Int. Comput. Games Assoc..

[23] Wolfgang Konen,et al. Reinforcement Learning with N-tuples on the Game Connect-4 , 2012, PPSN.

[24] Simon M. Lucas. Learning to Play Othello with N-Tuple Systems , 2008 .

[25] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[26] I-Chen Wu,et al. Multistage Temporal Difference Learning for 2048-Like Games , 2017, IEEE Transactions on Computational Intelligence and AI in Games.

[27] Takashi Chikayama,et al. Comparison Training of Shogi Evaluation Functions with Self-Generated Training Positions and Moves , 2013, Computers and Games.

[28] Michael Buro. Experiments with Multi-ProbCut and a New High-Quality Evaluation Function for Othello , 1997 .

[29] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[30] Andrew Tridgell,et al. Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.