Learning with Monte-Carlo methods

The Monte-Carlo method in games and puzzles consists in playing random games called playouts in order to estimate the value of a position. The method is related to learning since the algorithm dynamically learns which moves are good and which moves are bad as more playouts are played. The learning is achieved by keeping statistics on the outcomes of the random games that started with a move. This algorithm is strongly linked with the area of machine learning named reinforcement learning. It has benefited from research on the multiarmed bandit problem in the area of machine learning (Auer et al. 2002).

[1]  Yngvi Björnsson,et al.  CadiaPlayer: A Simulation-Based General Game Player , 2009, IEEE Transactions on Computational Intelligence and AI in Games.

[2]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[3]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[4]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[5]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.