论文信息 - Temporal difference learning and TD-Gammon

Temporal difference learning and TD-Gammon

Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, checkers, Othello, and backgammon has been widely regarded as an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning. Such board games offer the challenge of tremendous complexity and sophistication required to play at expert level. At the same time, the problem inputs and performance measures are clear-cut and well defined, and the game environment is readily automated in that it is easy to simulate the board, the rules of legal play, and the rules regarding when the game is over and determining the outcome.

Gerald Tesauro | G. Tesauro

[1] Claude E. Shannon,et al. Programming a computer for playing chess , 1950 .

[2] Norman Zadeh,et al. On Optimal Doubling in Backgammon , 1977 .

[3] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[4] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[5] Gerald Tesauro,et al. Neurogammon Wins Computer Olympiad , 1989, Neural Computation.

[6] Christian Lebiere,et al. The Cascade-Correlation Learning Architecture , 1989, NIPS.

[7] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[8] Paul E. Utgoff,et al. Automatic Feature Generation for Problem Solving Systems , 1992, ML.

[9] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[10] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.