Temporal-difference search in computer Go
暂无分享,去创建一个
Richard S. Sutton | David Silver | Martin Müller | R. Sutton | D. Silver | Martin Müller | David Silver
[1] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[2] S. Haykin,et al. Adaptive Filter Theory , 1986 .
[3] S. Thomas Alexander,et al. Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.
[4] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[5] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.
[6] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[7] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[8] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[9] Michael I. Jordan. Why the logistic function? A tutorial discussion on probabilities and neural networks , 1995 .
[10] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[11] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.
[12] M. Enzenberger. The Integration of A Priori Knowledge into a Go Playing Neural Network , 1996 .
[13] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[14] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[15] Jonathan Schaeffer,et al. One jump ahead - challenging human supremacy in checkers , 1997, J. Int. Comput. Games Assoc..
[16] Ken Chen,et al. Machine Learning, Game Play, and Go , 1998 .
[17] Michael Buro,et al. From Simple Features to Sophisticated Evaluation Functions , 1998, Computers and Games.
[18] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[19] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[20] Jonathan Schaeffer,et al. The games computers (and people) play , 2000, Adv. Comput..
[21] Johannes Fürnkranz,et al. Machine learning in games: a survey , 2001 .
[22] Jonathan Schaeffer,et al. Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.
[23] Fredrik A. Dahl,et al. Honte, a go-playing program using neural nets , 2001 .
[24] Martin Müller,et al. Computer Go , 2002, Artif. Intell..
[25] John N. Tsitsiklis,et al. On the Convergence of Optimistic Policy Iteration , 2002, J. Mach. Learn. Res..
[26] Eric O. Postma,et al. Local Move Prediction in Go , 2002, Computers and Games.
[27] Markus Enzenberger,et al. Evaluation in Go by a Neural Network using Soft Segmentation , 2003, ACG.
[28] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[29] Peter Dayan,et al. Analytical Mean Squared Error Curves for Temporal Difference Learning , 1996, Machine Learning.
[30] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.
[31] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[32] Andrew Tridgell,et al. Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.
[33] M. F. Prottsman. Shape Up , 2004 .
[34] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[35] Thore Graepel,et al. Bayesian pattern ranking for move prediction in the game of Go , 2006, ICML.
[36] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .
[37] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[38] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[39] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.
[40] Rémi Coulom,et al. Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..
[41] Helmut A. Mayer,et al. Board Representations for Neural Go Players Learning by Temporal Difference , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.
[42] David Silver,et al. Combining Online and Offline Learning in UCT , 2007 .
[43] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.
[44] S. Gelly,et al. Combining expert, offline, transient and online knowledge in Monte-Carlo exploration , 2008 .
[45] Rémi Coulom,et al. Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength , 2008, Computers and Games.
[46] Yngvi Björnsson,et al. Simulation-Based Approach to General Game Playing , 2008, AAAI.
[47] Richard J. Lorentz. Amazons Discover Monte-Carlo , 2008, Computers and Games.
[48] Nathan R. Sturtevant,et al. An Analysis of UCT in Multi-Player Games , 2008, J. Int. Comput. Games Assoc..
[49] Gerald Tesauro,et al. Monte-Carlo simulation balancing , 2009, ICML '09.
[50] Joel Veness,et al. Bootstrapping from Game Tree Search , 2009, NIPS.
[51] Alan Fern,et al. UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.
[52] Mark H. M. Winands,et al. Evaluation Function Based Monte-Carlo LOA , 2009, ACG.
[53] David Silver,et al. Reinforcement Learning and Simulation Based Search in the Game of Go , 2009 .
[54] Martin Müller,et al. Fuego—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search , 2010, IEEE Transactions on Computational Intelligence and AI in Games.
[55] Shih-Chieh Huang,et al. Monte-Carlo Simulation Balancing in Practice , 2010, Computers and Games.
[56] David Silver,et al. Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..