Hierarchical Decision Making by Generating and Following Natural Language Instructions

We explore using latent natural language instructions as an expressive and compositional representation of complex actions for hierarchical decision making. Rather than directly selecting micro-actions, our agent first generates a latent plan in natural language, which is then executed by a separate model. We introduce a challenging real-time strategy game environment in which the actions of a large number of units must be coordinated across long time scales. We gather a dataset of 76 thousand pairs of instructions and executions from human play, and train instructor and executor models. Experiments show that models using natural language as a latent variable significantly outperform models that directly imitate human actions. The compositional structure of language proves crucial to its effectiveness for action representation. We also release our code, models and data.

[1]  Nicolas Usunier,et al.  Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[2]  Matthew R. Walter,et al.  Navigational Instruction Generation as Inverse Reinforcement Learning with Neural Machine Translation , 2016, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[3]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[4]  Yuandong Tian,et al.  ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[5]  Matthew R. Walter,et al.  Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[6]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[7]  Dan Klein,et al.  Unified Pragmatic Models for Generating and Following Instructions , 2017, NAACL.

[8]  Liang Wang,et al.  Hierarchical Macro Strategy Model for MOBA Game AI , 2018, AAAI.

[9]  Mike Lewis,et al.  Hierarchical Text Generation and Planning for Strategic Dialogue , 2017, ICML.

[10]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[11]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[12]  Michael Buro,et al.  Build Order Optimization in StarCraft , 2011, AIIDE.

[13]  Razvan Pascanu,et al.  Relational Deep Reinforcement Learning , 2018, ArXiv.

[14]  Rob Fergus,et al.  MazeBase: A Sandbox for Learning from Games , 2015, ArXiv.

[15]  Florian Richoux,et al.  TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games , 2016, ArXiv.

[16]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[17]  Michael Buro,et al.  Fast Heuristic Search for RTS Game Combat Scenarios , 2012, AIIDE.

[18]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[19]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[20]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[21]  Luke S. Zettlemoyer,et al.  Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[22]  Bo Li,et al.  TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game , 2018, ArXiv.

[23]  Demis Hassabis,et al.  Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.

[24]  Jason Weston,et al.  ParlAI: A Dialog Research Software Platform , 2017, EMNLP.

[25]  Michal Certický Implementing a Wall-In Building Placement in StarCraft with Declarative Programming , 2013, ArXiv.

[26]  Dan Klein,et al.  Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.

[27]  Luke S. Zettlemoyer,et al.  Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[28]  Santiago Ontañón,et al.  A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.