Interactive Learning from Activity Description

We present a novel interactive learning protocol that enables training request-fulfilling agents by verbally describing their activities. Unlike imitation learning (IL), our protocol allows the teaching agent to provide feedback in a language that is most appropriate for them. Compared with reward in reinforcement learning (RL), the description feedback is richer and allows for improved sample complexity. We develop a probabilistic framework and an algorithm that practically implements our protocol. Empirical results in two challenging request-fulfilling problems demonstrate the strengths of our approach: compared with RL baselines, it is more sample-efficient; compared with IL baselines, it achieves competitive success rates without requiring the teaching agent to be able to demonstrate the desired behavior using the learning agent’s actions. Apart from empirical evaluation, we also provide theoretical guarantees for our algorithm under certain assumptions about the teacher and the environment.

[1]  Fanglin Chen,et al.  Programming IoT Devices by Demonstration Using Mobile Apps , 2017, IS-EUD.

[2]  Pieter Abbeel,et al.  Third-Person Imitation Learning , 2017, ICLR.

[3]  Chelsea Finn,et al.  Language as an Abstraction for Hierarchical Deep Reinforcement Learning , 2019, NeurIPS.

[4]  Demis Hassabis,et al.  Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.

[5]  Song-Chun Zhu,et al.  Jointly Learning Grounded Task Structures from Language Instruction and Visual Demonstration , 2016, EMNLP.

[6]  Luke S. Zettlemoyer,et al.  Semantic Parsing with Combinatory Categorial Grammars , 2013, ACL.

[7]  Jason Baldridge,et al.  General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping , 2019, ViGIL@NeurIPS.

[8]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[9]  Prasoon Goyal,et al.  Using Natural Language for Reward Shaping in Reinforcement Learning , 2019, IJCAI.

[10]  Stefan Riezler,et al.  Bandit Structured Prediction for Neural Sequence-to-Sequence Learning , 2017, ACL.

[11]  Wen-tau Yih,et al.  An Imitation Game for Learning Semantic Parsers from User Interaction , 2020, EMNLP.

[12]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[13]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[14]  Dan Klein,et al.  Learning with Latent Language , 2017, NAACL.

[15]  Miroslav Dudík,et al.  Reinforcement Learning with Convex Constraints , 2019, NeurIPS.

[16]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[17]  Dan Roth,et al.  Learning from natural instructions , 2011, Machine Learning.

[18]  Toby Jia-Jun Li,et al.  Interactive Task and Concept Learning from Natural Language Instructions and GUI Demonstrations , 2019 .

[19]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Dan Klein,et al.  Unified Pragmatic Models for Generating and Following Instructions , 2017, NAACL.

[21]  Michael C. Frank,et al.  Review Pragmatic Language Interpretation as Probabilistic Inference , 2022 .

[22]  Pierre-Yves Oudeyer,et al.  Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration , 2020, NeurIPS.

[23]  Dan Klein,et al.  A Game-Theoretic Approach to Generating Spatial Descriptions , 2010, EMNLP.

[24]  Smaranda Muresan,et al.  Grounding English Commands to Reward Functions , 2015, Robotics: Science and Systems.

[25]  Ivan Titov,et al.  Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols , 2017, NIPS.

[26]  Akshay Krishnamurthy,et al.  FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs , 2020, NeurIPS.

[27]  Stefanie Tellex,et al.  Learning perceptually grounded word meanings from unaligned parallel data , 2012, Machine Learning.

[28]  Olivier Pietquin,et al.  HIGhER: Improving instruction following with Hindsight Generation for Experience Replay , 2020, 2020 IEEE Symposium Series on Computational Intelligence (SSCI).

[29]  John Langford,et al.  Mapping Instructions and Visual Observations to Actions with Reinforcement Learning , 2017, EMNLP.

[30]  Thomas L. Griffiths,et al.  Learning Rewards from Linguistic Feedback , 2020, AAAI.

[31]  Siobhan Chapman Logic and Conversation , 2005 .

[32]  Hal Daumé,et al.  Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning , 2019, EMNLP.

[33]  Kevin Lee,et al.  Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions , 2014, Int. J. Robotics Res..

[34]  Stefan Riezler,et al.  Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning , 2018, ACL.

[35]  Yoshua Bengio,et al.  BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop , 2018, ArXiv.

[36]  Tom M. Mitchell,et al.  Towards Effective Human-AI Collaboration in GUI-Based Interactive Task Learning Agents , 2020, ArXiv.

[37]  Tom M. Mitchell,et al.  Learning to Learn Semantic Parsers from Natural Language Supervision , 2019, EMNLP.

[38]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[39]  Licheng Yu,et al.  Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout , 2019, NAACL.

[40]  Christopher D. Manning,et al.  Learning Language Games through Interaction , 2016, ACL.

[41]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[42]  Terry Winograd,et al.  Understanding natural language , 1974 .

[43]  Kyunghyun Cho,et al.  Emergent Communication in a Multi-Modal, Multi-Step Referential Game , 2017, ICLR.

[44]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[45]  Khanh Nguyen,et al.  Vision-Based Navigation With Language-Based Assistance via Imitation Learning With Indirect Intervention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Sergey Levine,et al.  From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following , 2019, ICLR.

[47]  Dan Klein,et al.  Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.

[48]  David Gaddy,et al.  Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following , 2019, ACL.

[49]  Ruslan Salakhutdinov,et al.  Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.

[50]  Dan Klein,et al.  Reasoning about Pragmatics with Neural Listeners and Speakers , 2016, EMNLP.

[51]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[52]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[53]  Christopher Potts,et al.  Learning in the Rational Speech Acts Model , 2015, ArXiv.

[54]  Sanja Fidler,et al.  ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning , 2019, ArXiv.

[55]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[56]  José M. F. Moura,et al.  Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[57]  Daniel Marcu,et al.  Natural Language Communication with Robots , 2016, NAACL.

[58]  Hal Daumé,et al.  Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback , 2017, EMNLP.

[59]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[60]  Yoav Artzi,et al.  TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  S. Riezler,et al.  Learning from Human Feedback: Challenges for Real-World Reinforcement Learning in NLP , 2020, ArXiv.

[62]  Li Zhou,et al.  Inverse Reinforcement Learning with Natural Language Goals , 2020, ArXiv.

[63]  Luke S. Zettlemoyer,et al.  A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.

[64]  Stefan Lee,et al.  Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[65]  Matthew R. Walter,et al.  Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[66]  Sanja Fidler,et al.  Teaching Machines to Describe Images via Natural Language Feedback , 2017, ArXiv.

[67]  Byron Boots,et al.  Provably Efficient Imitation Learning from Observation Alone , 2019, ICML.