Learning to Win by Reading Manuals in a Monte-Carlo Framework

This paper presents a novel approach for leveraging automatically extracted textual knowledge to improve the performance of control applications such as games. Our ultimate goal is to enrich a stochastic player with high-level guidance expressed in text. Our model jointly learns to identify text that is relevant to a given game state in addition to learning game strategies guided by the selected text. Our method operates in the Monte-Carlo search framework, and learns both text analysis and game strategies based only on environment feedback. We apply our approach to the complex strategy game Civilization II using the official game manual as the text guide. Our results show that a linguistically-informed game-playing agent significantly outperforms its language-unaware counterpart, yielding a 27% absolute improvement and winning over 78% of games when playing against the built-in AI of Civilization II.

[1]  George M. Siouris,et al.  Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[3]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[4]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[5]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  Jonathan Schaeffer,et al.  Using Probabilistic Knowledge and Simulation to Play Poker , 1999, AAAI/IAAI.

[8]  Paul R. Cohen,et al.  Grounding knowledge in sensors: unsupervised learning for language and planning , 2001 .

[9]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Jeffrey Mark Siskind,et al.  Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic , 2001, J. Artif. Intell. Res..

[11]  Brian Sheppard,et al.  World-championship-caliber Scrabble , 2002, Artif. Intell..

[12]  Alex Pentland,et al.  Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[13]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[14]  Chen Yu,et al.  On the Integration of Grounding Language and Learning Objects , 2004, AAAI.

[15]  Ronald,et al.  Learning representations by backpropagating errors , 2004 .

[16]  Deb Roy,et al.  Intentional Context in Situated Natural Language Learning , 2005, CoNLL.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Deb Roy,et al.  Speaking with your Sidekick: Understanding Situated Speech in Computer Role Playing Games , 2005, AIIDE.

[19]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[20]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[21]  Richard S. Sutton,et al.  On the role of tracking in stationary environments , 2007, ICML '07.

[22]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[23]  Raymond J. Mooney,et al.  Learning to Connect Language and Perception , 2008, AAAI.

[24]  Richard S. Sutton,et al.  Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[25]  Nathan R. Sturtevant,et al.  An Analysis of UCT in Multi-Player Games , 2008, J. Int. Comput. Games Assoc..

[26]  Luke S. Zettlemoyer,et al.  Learning Context-Dependent Mappings from Sentences to Logical Form , 2009, ACL.

[27]  Dan Roth,et al.  Reading to Learn: Constructing Features from Semantic Abstracts , 2009, EMNLP.

[28]  Alan Fern,et al.  UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.

[29]  Dan Klein,et al.  Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[30]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[31]  David Silver,et al.  Reinforcement learning and simulation-based search in computer go , 2009 .

[32]  David Silver,et al.  Reinforcement Learning and Simulation Based Search in the Game of Go , 2009 .

[33]  Luke S. Zettlemoyer,et al.  Reading between the Lines: Learning to Map High-Level Instructions to Commands , 2010, ACL.

[34]  Ming-Wei Chang,et al.  Driving Semantic Parsing from the World’s Response , 2010, CoNLL.

[35]  Daniel Jurafsky,et al.  Learning to Follow Navigational Directions , 2010, ACL.

[36]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[37]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[38]  Regina Barzilay,et al.  Non-Linear Monte-Carlo Search in Civilization II , 2011, IJCAI.

[39]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[40]  Dan Roth,et al.  Confidence Driven Unsupervised Semantic Parsing , 2011, ACL.

[41]  Raymond J. Mooney Learning Language from Its Perceptual Context , 2011, PADL.

[42]  Raymond J. Mooney,et al.  Learning Language from Perceptual Context , 2012, EACL.