Interactive Language Acquisition with One-shot Visual Concept Learning through a Conversational Game

Building intelligent agents that can communicate with and learn from humans in natural language is of great value. Supervised language learning is limited by the ability of capturing mainly the statistics of training data, and is hardly adaptive to new scenarios or flexible for acquiring new knowledge without inefficient retraining or catastrophic forgetting. We highlight the perspective that conversational interaction serves as a natural interface both for language learning and for novel knowledge acquisition and propose a joint imitation and reinforcement approach for grounded language learning through an interactive conversational game. The agent trained with this approach is able to actively acquire information by asking questions about novel objects and use the just-learned knowledge in subsequent conversations in a one-shot fashion. Results compared with other methods verified the effectiveness of the proposed approach.

[1]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[2]  R. Miyamoto,et al.  Effects of Early Auditory Experience on Word Learning and Speech Perception in Deaf Children With Cochlear Implants: Implications for Sensitive Periods of Language Development , 2010, Otology & neurotology : official publication of the American Otological Society, American Neurotology Society [and] European Academy of Otology and Neurotology.

[3]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[4]  C.H.E. Kwakman,et al.  Conversational learning. An experiential approach to knowledge creation , 2004 .

[5]  Pieter Abbeel,et al.  Third-Person Imitation Learning , 2017, ICLR.

[6]  Joshua B. Tenenbaum,et al.  One shot learning of simple visual concepts , 2011, CogSci.

[7]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[8]  Wei Xu,et al.  Interactive Grounded Language Acquisition and Generalization in a 2D World , 2018, ICLR.

[9]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[10]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.

[11]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[12]  Jason Weston,et al.  Dialog-based Language Learning , 2016, NIPS.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Christopher D. Manning,et al.  Learning Language Games through Interaction , 2016, ACL.

[15]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[16]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[17]  Nando de Freitas,et al.  Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[18]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[19]  Olivier Pietquin,et al.  End-to-end optimization of goal-driven and visually grounded dialogue systems , 2017, IJCAI.

[20]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[21]  J. Elman,et al.  Learning to use words: Event-related potentials index single-shot contextual word learning , 2010, Cognition.

[22]  Wei Xu,et al.  Listen, Interact and Talk: Learning to Speak via Interaction , 2017, ArXiv.

[23]  J. Michael Verbal behavior. , 1984, Journal of the experimental analysis of behavior.

[24]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[25]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[26]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[27]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[28]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[29]  Jason Weston,et al.  Learning Through Dialogue Interactions , 2016, ICLR.

[30]  P. Kuhl Early language acquisition: cracking the speech code , 2004, Nature Reviews Neuroscience.

[31]  Srinivas Bangalore,et al.  Natural Language Generation in Interactive Systems , 2014 .

[32]  A. Petursdottir,et al.  Reinforcement Contingencies in Language Acquisition , 2017 .

[33]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[34]  Stefan Lee,et al.  Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[36]  M. D’Esposito Working memory. , 2008, Handbook of clinical neurology.

[37]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[38]  S. Waxman Everything Had a Name, and Each Name Gave Birth to a New Thought: Links between Early Word Learning and Conceptual Organization. , 2004 .