Knowledge Generation for Improving Simulations in UCT for General Game Playing

General Game Playing (GGP) aims at developing game playing agents that are able to play a variety of games and, in the absence of pre-programmed game specific knowledge, become proficient players. Most GGP players have used standard tree-search techniques enhanced by automatic heuristic learning. The UCT algorithm, a simulation-based tree search, is a new approach and has been used successfully in GGP. However, it relies heavily on random simulations to assign values to unvisited nodes and selecting nodes for descending down a tree. This can lead to slower convergence times in UCT. In this paper, we discuss the generation and evolution of domain-independent knowledge using both state and move patterns. This is then used to guide the simulations in UCT. In order to test the improvements, we create matches between a player using standard the UCT algorithm and one using UCT enhanced with knowledge.

[1]  Neil D. Lawrence,et al.  Missing Data in Kernel PCA , 2006, ECML.

[2]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[3]  Bikramjit Banerjee,et al.  General Game Learning Using Knowledge Transfer , 2007, IJCAI.

[4]  Bikramjit Banerjee and Gregory Kuhlmann and Peter Stone Value Function Transfer for General Game Playing , 2006 .

[5]  Scott D. Goodwin,et al.  General Game Playing with Ants , 2008, SEAL.

[6]  Elliot B. Koffman Learning Games through Pattern Recognition , 1968, IEEE Trans. Syst. Sci. Cybern..

[7]  Ziad Kobti,et al.  A Multi-Agent Architecture for Game Playing , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[8]  Michael R. Genesereth,et al.  General Game Playing: Overview of the AAAI Competition , 2005, AI Mag..

[9]  Jonathan Schaeffer,et al.  The History Heuristic and Alpha-Beta Search Enhancements in Practice , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Sylvain Gelly,et al.  Modifications of UCT and sequence-like simulations for Monte-Carlo Go , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[11]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[12]  James E. Clune,et al.  Heuristic Evaluation Functions for General Game Playing , 2007, KI - Künstliche Intelligenz.

[13]  Yngvi Björnsson,et al.  Simulation-Based Approach to General Game Playing , 2008, AAAI.

[14]  Richard S. Sutton,et al.  Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.

[15]  Albert L. Zobrist,et al.  A New Hashing Method with Application for Game Playing , 1990 .

[16]  Stephan Schiffel,et al.  Fluxplayer: A Successful General Game Player , 2007, AAAI.