Generating Artificial Neural Networks for Value Function Approximation in a Domain Requiring a Shifting Strategy

Artificial neural networks have been successfully used as approximating value functions for tasks involving decision making. In domains where a shift in judgment for decisions is necessary as the overall state changes, it is hypothesized that multiple neural networks are likely be beneficial as an approximation of a value function over those that employ a single network. For our experiments, the card game Dominion was chosen as the domain. This work compares neural networks generated by various machine learning methods successfully applied to other games (such as in TD-Gammon) to a genetic algorithm method for generating two neural networks for different phases of the game along with evolving a transition point. The results demonstrate a greater success ratio with the method hypothesized. This suggests future work examining more complex multiple neural network configurations could apply to this game domain as well as being applicable to other problems.

[1]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[2]  Moshe Sipper,et al.  Using GP-Gammon: Using Genetic Programming to Evolve Backgammon Players , 2005, EuroGP.

[3]  Michael Pfeiffer,et al.  Reinforcement Learning of Strategies for Settlers of Catan , 2004 .

[4]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[5]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[6]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[7]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[8]  Donald C. Wunsch,et al.  Computer Go: A Grand Challenge to AI , 2007, Challenges for Computational Intelligence.

[9]  Julian Togelius,et al.  Evolving card sets towards balancing dominion , 2012, 2012 IEEE Congress on Evolutionary Computation.

[10]  Darryl Charles,et al.  Improving Temporal Difference game agent control using a dynamic exploration during control learning , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[11]  D.B. Fogel,et al.  A self-learning evolutionary chess program , 2004, Proceedings of the IEEE.

[12]  Thomas Bartz-Beielstein,et al.  Reinforcement learning for games: failures and successes , 2009, GECCO '09.

[13]  Wee-Chong Oon,et al.  M 2 ICAL analyses HC-gammon , 2007, AAAI 2007.

[14]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[15]  David B. Fogel,et al.  Evolving neural networks to play checkers without relying on expert knowledge , 1999, IEEE Trans. Neural Networks.

[16]  Moshe Sipper,et al.  GP-EndChess: Using Genetic Programming to Evolve Chess Endgame Players , 2005, EuroGP.

[17]  Wee-Chong Oon,et al.  M 2 ICAL: A Tool for Analyzing Imperfect Comparison Algorithms , 2007 .

[18]  Jordan B. Pollack,et al.  Co-Evolution in the Successful Learning of Backgammon Strategy , 1998, Machine Learning.