Exploration exploitation in Go: UCT for Monte-Carlo Go
暂无分享,去创建一个
[1] Bernd Brügmann Max-Planck. Monte Carlo Go , 1993 .
[2] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[3] Bruno Bouzy,et al. Computer Go: An AI oriented survey , 2001, Artif. Intell..
[4] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[5] Akihiro Kishimoto,et al. A General Solution to the Graph History Interaction Problem , 2004, AAAI.
[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[7] Jan Willemson,et al. Improved Monte-Carlo Search , 2006 .
[8] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[9] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.