Zero-shot Task Adaptation using Natural Language

Imitation learning and instruction-following are two common approaches to communicate a user’s intent to a learning agent. However, as the complexity of tasks grows, it may be beneficial to use both demonstrations and language to communicate with an agent. In this work, we propose a novel setting where, given a demonstration for a task (the source task), and a natural language description of the differences between the demonstrated task and a related but different task (the target task), our goal is to train an agent to complete the target task in a zero-shot setting—that is, without any demonstrations for the target task. To this end, we introduce Language-Aided Reward and Value Adaptation (LARVA) which, given a source demonstration and a linguistic description of how the target task differs, learns to output either a reward or value function that accurately reflects the target task. Our experiments show that on a diverse set of adaptations, our approach is able to complete more than 95% of target tasks when using template-based descriptions, and more than 70% when using free-form natural language.

[1]  Shimon Whiteson,et al.  A Survey of Reinforcement Learning Informed by Natural Language , 2019, IJCAI.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[4]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[5]  Kevin Lee,et al.  Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions , 2014, Int. J. Robotics Res..

[6]  Yuan-Fang Wang,et al.  Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Roberto Mart'in-Mart'in,et al.  robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , 2020, ArXiv.

[8]  Daniel Jurafsky,et al.  Learning to Follow Navigational Directions , 2010, ACL.

[9]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[10]  Roozbeh Mottaghi,et al.  ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[12]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[13]  James T. Kwok,et al.  Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[14]  Regina Barzilay,et al.  Grounding Language for Transfer in Deep Reinforcement Learning , 2017, J. Artif. Intell. Res..

[15]  Matthew R. Walter,et al.  Learning models for following natural language directions in unknown environments , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[17]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[18]  Karthik Narasimhan,et al.  Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning , 2021, ICML.

[19]  Chris Sauer,et al.  Beating Atari with Natural Language Guided Reinforcement Learning , 2017, ArXiv.

[20]  Prasoon Goyal,et al.  PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards , 2020, ArXiv.

[21]  Dan Klein,et al.  Learning with Latent Language , 2017, NAACL.

[22]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[23]  Prasoon Goyal,et al.  Using Natural Language for Reward Shaping in Reinforcement Learning , 2019, IJCAI.

[24]  Joaquin Vanschoren,et al.  Meta-Learning: A Survey , 2018, Automated Machine Learning.

[25]  Hadas Kress-Gazit,et al.  Contextual awareness: Understanding monologic natural language instructions for autonomous robots , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[26]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[27]  Regina Barzilay,et al.  Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.

[28]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Dan Klein,et al.  Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.

[30]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[31]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[32]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[33]  Oliver Kroemer,et al.  A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms , 2019, J. Mach. Learn. Res..

[34]  Chitta Baral,et al.  Language-Conditioned Imitation Learning for Robot Manipulation Tasks , 2020, NeurIPS.

[35]  Ashutosh Saxena,et al.  Robobarista: Object Part Based Transfer of Manipulation Trajectories from Crowd-Sourcing in 3D Pointclouds , 2015, ISRR.

[36]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[37]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[38]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.