Sample-based learning and search with permanent and transient memories
暂无分享,去创建一个
[1] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.
[2] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .
[3] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[4] Fredrik A. Dahl,et al. Honte, a go-playing program using neural nets , 2001 .
[5] David Silver,et al. Combining Online and Offline Learning in UCT , 2007 .
[6] Andrew Tridgell,et al. Experiments in Parameter Learning Using Temporal Differences , 1998, J. Int. Comput. Games Assoc..
[7] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[8] Richard S. Sutton,et al. On the role of tracking in stationary environments , 2007, ICML '07.
[9] Michael Buro,et al. From Simple Features to Sophisticated Evaluation Functions , 1998, Computers and Games.
[10] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[11] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.
[12] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.
[13] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Jonathan Schaeffer,et al. Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.
[16] Markus Enzenberger,et al. Evaluation in Go by a Neural Network using Soft Segmentation , 2003, ACG.
[17] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[18] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[19] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.
[20] Rémi Coulom,et al. Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..
[21] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.