RTFM: Generalising to Novel Environment Dynamics via Reading

Obtaining policies that can generalise to new environments in reinforcement learning is challenging. In this work, we demonstrate that language understanding via a reading policy learner is a promising vehicle for generalisation to new environments. We propose a grounded policy learning problem, Read to Fight Monsters (RTFM), in which the agent must jointly reason over a language goal, relevant dynamics described in a document, and environment observations. We procedurally generate environment dynamics and corresponding language descriptions of the dynamics, such that agents must read to understand new environment dynamics instead of memorising any particular information. In addition, we propose txt2$\pi$, a model that captures three-way interactions between the goal, document, and observations. On RTFM, txt2$\pi$ generalises to new environments with dynamics not seen during training via reading. Furthermore, our model outperforms baselines such as FiLM and language-conditioned CNNs on RTFM. Through curriculum learning, txt2$\pi$ produces policies that excel on complex RTFM tasks requiring several reasoning and coreference steps.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Shimon Whiteson,et al.  A Survey of Reinforcement Learning Informed by Natural Language , 2019, IJCAI.

[3]  Pushmeet Kohli,et al.  Learning to Follow Language Instructions with Adversarial Reward Induction , 2018, ArXiv.

[4]  Dan Klein,et al.  Learning with Latent Language , 2017, NAACL.

[5]  Satchuthananthavale Rasiah Kuhan Branavan,et al.  Grounding linguistic analysis in control applications , 2012 .

[6]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[7]  Luke S. Zettlemoyer,et al.  End-to-end Neural Coreference Resolution , 2017, EMNLP.

[8]  Regina Barzilay,et al.  Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.

[9]  Yuandong Tian,et al.  Hierarchical Decision Making by Generating and Following Natural Language Instructions , 2019, NeurIPS.

[10]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[11]  Chelsea Finn,et al.  Language as an Abstraction for Hierarchical Deep Reinforcement Learning , 2019, NeurIPS.

[12]  Richard Socher,et al.  Global-Locally Self-Attentive Encoder for Dialogue State Tracking , 2018, ACL.

[13]  Regina Barzilay,et al.  Deep Transfer in Reinforcement Learning by Language Grounding , 2017, ArXiv.

[14]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[15]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[16]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, HRI 2010.

[17]  John DeNero,et al.  Guiding Policies with Language via Meta-Learning , 2018, ICLR.

[18]  Christopher D. Manning,et al.  Learning Language Games through Interaction , 2016, ACL.

[19]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[20]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Regina Barzilay,et al.  Learning High-Level Planning from Text , 2012, ACL.

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[24]  Demis Hassabis,et al.  Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.

[25]  Edward Grefenstette,et al.  TorchBeast: A PyTorch Platform for Distributed RL , 2019, ArXiv.

[26]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[27]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[28]  Dan Klein,et al.  Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.