论文信息 - Zero-shot Task Adaptation using Natural Language

Zero-shot Task Adaptation using Natural Language

Imitation learning and instruction-following are two common approaches to communicate a user’s intent to a learning agent. However, as the complexity of tasks grows, it may be beneficial to use both demonstrations and language to communicate with an agent. In this work, we propose a novel setting where, given a demonstration for a task (the source task), and a natural language description of the differences between the demonstrated task and a related but different task (the target task), our goal is to train an agent to complete the target task in a zero-shot setting—that is, without any demonstrations for the target task. To this end, we introduce Language-Aided Reward and Value Adaptation (LARVA) which, given a source demonstration and a linguistic description of how the target task differs, learns to output either a reward or value function that accurately reflects the target task. Our experiments show that on a diverse set of adaptations, our approach is able to complete more than 95% of target tasks when using template-based descriptions, and more than 70% when using free-form natural language.

[1] Shimon Whiteson,et al. A Survey of Reinforcement Learning Informed by Natural Language , 2019, IJCAI.

[2] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[4] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[5] Kevin Lee,et al. Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions , 2014, Int. J. Robotics Res..

[6] Yuan-Fang Wang,et al. Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Roberto Mart'in-Mart'in,et al. robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , 2020, ArXiv.

[8] Daniel Jurafsky,et al. Learning to Follow Navigational Directions , 2010, ACL.

[9] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[10] Roozbeh Mottaghi,et al. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[12] Benjamin Kuipers,et al. Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[13] James T. Kwok,et al. Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[14] Regina Barzilay,et al. Grounding Language for Transfer in Deep Reinforcement Learning , 2017, J. Artif. Intell. Res..

[15] Matthew R. Walter,et al. Learning models for following natural language directions in unknown environments , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[16] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[17] Matthew R. Walter,et al. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[18] Karthik Narasimhan,et al. Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning , 2021, ICML.

[19] Chris Sauer,et al. Beating Atari with Natural Language Guided Reinforcement Learning , 2017, ArXiv.

[20] Prasoon Goyal,et al. PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards , 2020, ArXiv.

[21] Dan Klein,et al. Learning with Latent Language , 2017, NAACL.

[22] Raymond J. Mooney,et al. Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[23] Prasoon Goyal,et al. Using Natural Language for Reward Shaping in Reinforcement Learning , 2019, IJCAI.

[24] Joaquin Vanschoren,et al. Meta-Learning: A Survey , 2018, Automated Machine Learning.

[25] Hadas Kress-Gazit,et al. Contextual awareness: Understanding monologic natural language instructions for autonomous robots , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[26] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[27] Regina Barzilay,et al. Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.

[28] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Dan Klein,et al. Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.

[30] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[31] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[32] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[33] Oliver Kroemer,et al. A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms , 2019, J. Mach. Learn. Res..

[34] Chitta Baral,et al. Language-Conditioned Imitation Learning for Robot Manipulation Tasks , 2020, NeurIPS.

[35] Ashutosh Saxena,et al. Robobarista: Object Part Based Transfer of Manipulation Trajectories from Crowd-Sourcing in 3D Pointclouds , 2015, ISRR.

[36] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.

[37] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[38] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.