Non-Linear Monte-Carlo Search in Civilization II

This paper presents a new Monte-Carlo search algorithm for very large sequential decision-making problems. We apply non-linear regression within Monte-Carlo search, online, to estimate a state-action value function from the outcomes of random roll-outs. This value function generalizes between related states and actions, and can therefore provide more accurate evaluations after fewer rollouts. A further significant advantage of this approach is its ability to automatically extract and leverage domain knowledge from external sources such as game manuals. We apply our algorithm to the game of Civilization II, a challenging multiagent strategy game with an enormous state space and around 1021 joint actions. We approximate the value function by a neural network, augmented by linguistic knowledge that is extracted automatically from the official game manual. We show that this non-linear value function is significantly more efficient than a linear value function, which is itself more efficient than Monte-Carlo tree search. Our non-linear Monte-Carlo search wins over 78% of games against the built-in AI of Civilization II.

[1]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[2]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[3]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  Jonathan Schaeffer,et al.  Using Probabilistic Knowledge and Simulation to Play Poker , 1999, AAAI/IAAI.

[6]  Brian Sheppard,et al.  World-championship-caliber Scrabble , 2002, Artif. Intell..

[7]  M. Rehm,et al.  Proceedings of AAMAS , 2005 .

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Vincent Corruble,et al.  Designing a Reinforcement Learning-based Adaptive AI for Large-Scale Strategy Games , 2006, AIIDE.

[10]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[11]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[12]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[13]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[14]  David Silver,et al.  Combining Online and Offline Learning in UCT , 2007 .

[15]  Pieter Spronck,et al.  Adaptive Spatial Reasoning for Turn-based Strategy Games , 2008, AIIDE.

[16]  Richard S. Sutton,et al.  Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[17]  Ian D. Watson,et al.  Using reinforcement learning for city site selection in the turn-based strategy game Civilization IV , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[18]  Nathan R. Sturtevant,et al.  An Analysis of UCT in Multi-Player Games , 2008, J. Int. Comput. Games Assoc..

[19]  Alan Fern,et al.  UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.

[20]  Guy Shani,et al.  High-level reinforcement learning in strategy games , 2010, AAMAS.

[21]  Regina Barzilay,et al.  Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.