Single-player Monte-Carlo tree search for SameGame

Classic methods such as A^* and IDA^* are a popular and successful choice for one-player games. However, without an accurate admissible evaluation function, they fail. In this article we investigate whether Monte-Carlo tree search (MCTS) is an interesting alternative for one-player games where A^* and IDA^* methods do not perform well. Therefore, we propose a new MCTS variant, called single-player Monte-Carlo tree search (SP-MCTS). The selection and backpropagation strategy in SP-MCTS are different from standard MCTS. Moreover, SP-MCTS makes use of randomized restarts. We tested IDA^* and SP-MCTS on the puzzle SameGame and used the cross-entropy method to tune the SP-MCTS parameters. It turned out that our SP-MCTS program is able to score a substantial number of points on the standardized test set.

[1]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  Bruno Bouzy,et al.  Monte-Carlo strategies for computer Go , 2006 .

[4]  R. Rubinstein The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[5]  Jean Méhat,et al.  Combining UCT and Nested Monte Carlo Search for Single-Player General Game Playing , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[6]  Nathan R. Sturtevant,et al.  An Analysis of UCT in Multi-Player Games , 2008, J. Int. Comput. Games Assoc..

[7]  David S. Johnson,et al.  A Catalog of Complexity Classes , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[8]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[9]  Richard E. Korf,et al.  Depth-first vs best-first search , 1991 .

[10]  Richard J. Nowakowski,et al.  Games of No Chance 3: Surveys , 1998 .

[11]  Nihan Kesim Cicekli,et al.  A monolithic approach to automated composition of semantic web services with the Event Calculus , 2010, Knowl. Based Syst..

[12]  Jan van Leeuwen,et al.  Handbook of Theoretical Computer Science, Vol. A: Algorithms and Complexity , 1994 .

[13]  Bruno Bouzy,et al.  Associating domain-dependent knowledge and Monte Carlo approaches within a Go program , 2005, Inf. Sci..

[14]  Guillaume Maurice Jean-Bernard Chaslot Chaslot,et al.  Monte-Carlo Tree Search , 2010 .

[15]  E. KorfRichard Depth-first iterative-deepening: an optimal admissible tree search , 1985 .

[16]  Jonathan Schaeffer,et al.  Pushing the limits: new developments in single-agent search , 1999 .

[17]  Markus Püschel,et al.  Bandit-based optimization on graphs with application to library performance tuning , 2009, ICML '09.

[18]  Chelsea C. White,et al.  Multiobjective A* , 1991, JACM.

[19]  Jonathan Schaeffer,et al.  Dual Lookups in Pattern Databases , 2005, IJCAI.

[20]  Graham Kendall,et al.  A Survey of NP-Complete Puzzles , 2008, J. Int. Comput. Games Assoc..

[21]  Maarten P. D. Schadd,et al.  Addressing NP-Complete Puzzles with Monte-Carlo Methods 1 , 2008 .

[22]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[23]  H. Jaap van den Herik,et al.  Single-Player Monte-Carlo Tree Search , 2008, Computers and Games.

[24]  H. Jaap van den Herik,et al.  Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[25]  Tristan Cazenave,et al.  Nested Monte-Carlo Search , 2009, IJCAI.

[26]  Manfred Jaeger,et al.  Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007) , 2007, ICML 2007.

[27]  Herbert S. Wilf,et al.  Algorithms and Complexity , 1994, Lecture Notes in Computer Science.

[28]  Heinz Mühlenbein,et al.  The Equation for Response to Selection and Its Use for Prediction , 1997, Evolutionary Computation.

[29]  Erik D. Demaine,et al.  The Complexity of Clickomania , 2001, ArXiv.

[30]  Olivier Teytaud,et al.  Special Issue on Monte Carlo Techniques and Computer Go , 2010, IEEE Trans. Comput. Intell. AI Games.

[31]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[32]  H. Jaap van den Herik,et al.  Parallel Monte-Carlo Tree Search , 2008, Computers and Games.

[33]  Richard E. Korf,et al.  Depth-First Versus Best-First Search , 1991, AAAI.

[34]  Stefan Edelkamp,et al.  Finding the Needle in the Haystack with Heuristically Guided Swarm Tree Search , 2010, MKWI.

[35]  Michael H. Breitner,et al.  Multikonferenz Wirtschaftsinformatik 2010 , 2010 .

[36]  L. V. Allis,et al.  Searching for solutions in games and artificial intelligence , 1994 .

[37]  Jos W. H. M. Uiterwijk,et al.  Monte-Carlo tree search in production management problems , 2006 .

[38]  Donald F. Beal,et al.  Temporal Difference Learning for Heuristic Search and Game Playing , 2000, Inf. Sci..

[39]  Richard E. Korf,et al.  Depth-First Iterative-Deepening: An Optimal Admissible Tree Search , 1985, Artif. Intell..

[40]  Keh-Hsun Chen,et al.  Monte-Carlo Go with Knowledge-Guided Simulations , 2008, J. Int. Comput. Games Assoc..

[41]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[42]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[43]  Walter A. Kosters,et al.  Solving SameGame and its Chessboard Variant , 2009 .

[44]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[45]  Jonathan Schaeffer,et al.  Pattern Databases , 1998, Comput. Intell..

[46]  H. Jaap van den Herik,et al.  Cross-Entropy for Monte-Carlo Tree Search , 2008, J. Int. Comput. Games Assoc..

[47]  Journal of the Association for Computing Machinery , 1961, Nature.

[48]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[49]  Faruk Polat,et al.  Limited-Damage A*: A path search algorithm that considers damage as a feasibility criterion , 2011, Knowl. Based Syst..