RSPSA: Enhanced Parameter Optimization in Games

Most game programs have a large number of parameters that are crucial for their performance. Tuning these parameters by hand is rather difficult. Therefore automatic optimization algorithms in game programs are interesting research domains. However, successful applications are only known for parameters that belong to certain components (e.g., evaluation-function parameters). The SPSA (Simultaneous Perturbation Stochastic Approximation) algorithm is an attractive choice for optimizing any kind of parameters of a game program, both for its generality and its simplicity. Its disadvantage is that it can be very slow. In this article we propose several methods to speed up SPSA, in particular, the combination with RPROP, using common random numbers, antithetic variables, and averaging. We test the resulting algorithm for tuning various types of parameters in two domains, Poker and LOA. From the experimental study, we may conclude that using SPSA is a viable approach for optimization in game programs, in particular if no good alternative exists for the types of parameters considered.

[1]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[2]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[3]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[4]  Gang George Yin,et al.  Budget-Dependent Convergence Rate of Stochastic Approximation , 1995, SIAM J. Optim..

[5]  J. Spall Adaptive stochastic approximation by the simultaneous perturbation method , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[6]  David B. Fogel,et al.  Evolving neural networks to play checkers without relying on expert knowledge , 1999, IEEE Trans. Neural Networks.

[7]  J. Spall,et al.  Simulation-Based Optimization with Stochastic Approximation Using Common Random Numbers , 1999 .

[8]  M. Winands Informed Search in Complex Games , 2000 .

[9]  David N. L. Levy Some Comments on Realization Probabilities and the SEX Algorithm , 2002, J. Int. Comput. Games Assoc..

[10]  Tsan-sheng Hsu,et al.  Verification of Endgame Databases , 2002, J. Int. Comput. Games Assoc..

[11]  H. Jaap van den Herik,et al.  Learning in Lines of Action , 2002, GAME-ON.

[12]  Takashi Chikayama,et al.  Game-tree Search Algorithm based on Realization Probability , 2002, J. Int. Comput. Games Assoc..

[13]  T. Anthony Marsland,et al.  Learning extension parameters in game-tree search , 2003, Inf. Sci..

[14]  Christian Igel,et al.  Empirical evaluation of the improved Rprop learning algorithms , 2003, Neurocomputing.

[15]  H. Jaap van den Herik,et al.  Two Learning Algorithms for Forward Pruning , 2003, J. Int. Comput. Games Assoc..

[16]  Levente Kocsis Learning search decisions , 2003 .

[17]  Accelerated randomized stochastic optimization , 2003 .

[18]  Jonathan Schaeffer,et al.  Game-Tree Search with Adaptation in Stochastic Imperfect-Information Games , 2004, Computers and Games.

[19]  James heiler,et al.  On the choice of random directions for stochastic approximation algorithms , 2006, IEEE Transactions on Automatic Control.