PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards

Reinforcement learning (RL), particularly in sparse reward settings, often requires prohibitively large numbers of interactions with the environment, thereby limiting its applicability to complex problems. To address this, several prior approaches have used natural language to guide the agent's exploration. However, these approaches typically operate on structured representations of the environment, and/or assume some structure in the natural language commands. In this work, we propose a model that directly maps pixels to rewards, given a free-form natural language description of the task, which can then be used for policy learning. Our experiments on the Meta-World robot manipulation domain show that language-based rewards significantly improves the sample efficiency of policy learning, both in sparse and dense reward settings.

[1]  Shimon Whiteson,et al.  A Survey of Reinforcement Learning Informed by Natural Language , 2019, IJCAI.

[2]  David L. Roberts,et al.  Training an Agent to Ground Commands with Reward and Punishment , 2014, AAAI 2014.

[3]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[4]  Nicholas Waytowich,et al.  Grounding natural language commands to StarCraft II game states for narration-guided reinforcement learning , 2019, Defense + Commercial Sensing.

[5]  Pierre Sermanet,et al.  Grounding Language in Play , 2020, ArXiv.

[6]  Prasoon Goyal,et al.  Using Natural Language for Reward Shaping in Reinforcement Learning , 2019, IJCAI.

[7]  Dieter Fox,et al.  Prospection: Interpretable plans from language by predicting the future , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[8]  Stefanie Tellex,et al.  Learning to Parse Natural Language to Grounded Reward Functions with Weak Supervision , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Er Meng Joo,et al.  A survey of inverse reinforcement learning techniques , 2012 .

[10]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Regina Barzilay,et al.  Learning High-Level Planning from Text , 2012, ACL.

[12]  Hadas Kress-Gazit,et al.  Robots That Use Language , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[13]  Regina Barzilay,et al.  Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.

[14]  Regina Barzilay,et al.  Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[15]  Gregory Kuhlmann and Peter Stone and Raymond J. Mooney and Shavlik Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer , 2004, AAAI 2004.

[16]  Meng Joo Er,et al.  A survey of inverse reinforcement learning techniques , 2012, Int. J. Intell. Comput. Cybern..

[17]  Stefanie Tellex,et al.  Accurately and Efficiently Interpreting Human-Robot Instructions of Varying Granularities , 2017, Robotics: Science and Systems.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[20]  Oliver Kroemer,et al.  A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms , 2019, J. Mach. Learn. Res..

[21]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[22]  Marek Grzes,et al.  Reward Shaping in Episodic Reinforcement Learning , 2017, AAMAS.

[23]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[24]  Dan Klein,et al.  Learning with Latent Language , 2017, NAACL.

[25]  Sergey Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[26]  Mark O. Riedl,et al.  Guiding Reinforcement Learning Exploration Using Natural Language , 2017, AAMAS.

[27]  Chris Sauer,et al.  Beating Atari with Natural Language Guided Reinforcement Learning , 2017, ArXiv.

[28]  Sergey Levine,et al.  From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following , 2019, ICLR.

[29]  Ross A. Knepper,et al.  Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight , 2019, CoRL.

[30]  Andrew Bennett,et al.  Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction , 2018, EMNLP.

[31]  Ashutosh Saxena,et al.  Robobarista: Object Part Based Transfer of Manipulation Trajectories from Crowd-Sourcing in 3D Pointclouds , 2015, ISRR.

[32]  Pushmeet Kohli,et al.  Learning to Understand Goal Specifications by Modelling Reward , 2018, ICLR.