Comparison training of chess evaluation functions

The supervised learning methodology of "comparison training" (Tesauro 1989a) on a database of expert preferences is extended to search depths beyond 1-ply, and applied to the problem of training the weights in a linear evaluation function for the game of chess. An initial set of experiments was performed using SCP, a public-domain chess program. Training based on simple 1-ply searches was found to be ineffective, but for 1-ply plus quiescence expansion, high-quality solutions were found that outperform SCP's hand-tuned weights. The trained weights had performance that scaled well with search depth, and consistent improvement over the hand-tuned solution was found even for test depths much greater than the training search depth.A discretized version of the algorithm was also developed and used to tune a subset of the weights in DEEP BLUE, having to do primarily with king safety evaluation. Training was based on 4-ply search (plus quiescence), and good test-set generalization was found out to 7-ply. During the 1997 rematch with Garry Kasparov, the tuning of the king-safety weights made a critical difference in one important position in game 2, and in the program's general understanding and handling of game 6.