Grounding Language for Transfer in Deep Reinforcement Learning

In this paper, we explore the utilization of natural language to drive transfer for reinforcement learning (RL). Despite the wide-spread application of deep RL techniques, learning generalized policy representations that work across domains remains a challenging problem. We demonstrate that textual descriptions of environments provide a compact intermediate channel to facilitate effective policy transfer. Specifically, by learning to ground the meaning of text to the dynamics of the environment such as transitions and rewards, an autonomous agent can effectively bootstrap policy learning on a new domain given its description. We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized state representation to effectively use entity descriptions. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments. For instance, we achieve up to 14% and 11.5% absolute improvement over previously existing models in terms of average and initial rewards, respectively.

[1]  Dileep George,et al.  Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[5]  Dan Klein,et al.  Alignment-Based Compositional Semantics for Instruction Following , 2015, EMNLP.

[6]  Noah A. Smith,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016, ACL 2016.

[7]  Julian Togelius,et al.  General Video Game AI: Competition, Challenges and Opportunities , 2016, AAAI.

[8]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[9]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[10]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[11]  Matthew E. Taylor,et al.  Initial Progress in Transfer for Deep Reinforcement Learning Algorithms , 2016 .

[12]  Luke S. Zettlemoyer,et al.  Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[13]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[14]  Regina Barzilay,et al.  Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.

[15]  Sinno Jialin Pan,et al.  Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay , 2017, AAAI.

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  Luke S. Zettlemoyer,et al.  Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[18]  Matthew R. Walter,et al.  Learning spatial-semantic representations from natural language descriptions and scene classifications , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[20]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[21]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[22]  Peter Stone,et al.  Transferring Instances for Model-Based Reinforcement Learning , 2008, ECML/PKDD.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[25]  Matthew R. Walter,et al.  Learning Semantic Maps from Natural Language Descriptions , 2013, Robotics: Science and Systems.

[26]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[27]  Eric Eaton,et al.  Using Task Features for Zero-Shot Knowledge Transfer in Lifelong Learning , 2016, IJCAI.

[28]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[29]  Andrew G. Barto,et al.  Transfer in Reinforcement Learning via Shared Features , 2012, J. Mach. Learn. Res..

[30]  Luke S. Zettlemoyer,et al.  Reading between the Lines: Learning to Map High-Level Instructions to Commands , 2010, ACL.

[31]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[32]  Shane Legg,et al.  Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[33]  Dongho Kim,et al.  POMDP-based dialogue manager adaptation to extended domains , 2013, SIGDIAL Conference.

[34]  Eric Eaton,et al.  Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[35]  Tom Schaul,et al.  A video game description language for model-based or interactive learning , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[36]  Tze-Yun Leong,et al.  Transferring Expectations in Model-based Reinforcement Learning , 2012, NIPS.

[37]  Sergey Levine,et al.  Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[38]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[40]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.

[41]  Regina Barzilay,et al.  Non-Linear Monte-Carlo Search in Civilization II , 2011, IJCAI.

[42]  Girish Chowdhary,et al.  Cross-Domain Transfer in Reinforcement Learning Using Target Apprentice , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Balaraman Ravindran,et al.  $A^2T$: Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources , 2015 .

[46]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[47]  G. Konidaris A Framework for Transfer in Reinforcement Learning , 2006 .

[48]  Regina Barzilay,et al.  Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[49]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[50]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[51]  Mohammad Rostami,et al.  Joint Dictionaries for Zero-Shot Learning , 2017, AAAI.

[52]  Daniel Jurafsky,et al.  Learning to Follow Navigational Directions , 2010, ACL.

[53]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[54]  Peter Stone,et al.  Learning Inter-Task Transferability in the Absence of Target Task Samples , 2015, AAMAS.

[55]  Yannis Stylianou,et al.  Learning Domain-Independent Dialogue Policies via Ontology Parameterisation , 2015, SIGDIAL Conference.

[56]  Peter Stone,et al.  Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping , 2006, AAAI.

[57]  Ruben Glatt,et al.  Policy Reuse in Deep Reinforcement Learning , 2017, AAAI.

[58]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[59]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[60]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[61]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, HRI 2010.

[62]  Mark O. Riedl,et al.  Guiding Reinforcement Learning Exploration Using Natural Language , 2017, AAMAS.