Towards Learning to Speak and Hear Through Multi-Agent Communication over a Continuous Acoustic Channel

While multi-agent reinforcement learning has been used as an effective means to study emergent communication between agents, existing work has focused almost exclusively on communication with discrete symbols. Human communication often takes place (and emerged) over a continuous acoustic channel; human infants acquire language in large part through continuous signalling with their caregivers. We therefore ask: Are we able to observe emergent language between agents with a continuous communication channel trained through reinforcement learning? And if so, what is the impact of channel characteristics on the emerging language? We propose an environment and training methodology to serve as a means to carry out an initial exploration of these questions. We use a simple messaging environment where a “speaker” agent needs to convey a concept to a “listener”. The Speaker is equipped with a vocoder that maps symbols to a continuous waveform, this is passed over a lossy continuous channel, and the Listener needs to map the continuous signal to the concept. Using deep Q-learning, we show that basic compositionality emerges in the learned language representations. We find that noise is essential in the communication channel when conveying unseen concept combinations. And we show that we can ground the emergent communication by introducing a caregiver predisposed to “hearing” or “speaking” English. Finally, we describe how our platform serves as a starting point for future work that uses a combination of deep reinforcement learning and multi-agent systems to study our questions of continuous signalling in language learning and emergence.

[1]  Rahma Chaabouni,et al.  “LazImpa”: Lazy and Impatient neural agents learn to communicate efficiently , 2020, CONLL.

[2]  Ian S. Howard,et al.  Learning to Pronounce First Words in Three Languages: An Investigation of Caregiver and Infant Behavior Using a Computational Model of an Infant , 2014, PloS one.

[3]  Daniel Dor The instruction of imagination: language and its evolution as a communication technology , 2014 .

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Ivan Titov,et al.  Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols , 2017, NIPS.

[6]  Eugene Kharitonov,et al.  Compositionality and Generalization In Emergent Languages , 2020, ACL.

[7]  Song-Chun Zhu,et al.  Emergence of Pragmatics from Referential Game between Theory of Mind Agents , 2020, ArXiv.

[8]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[9]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[10]  Takahiro Shinozaki,et al.  Spoken Language Acquisition Based on Reinforcement Learning and Word Unit Segmentation , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Luc Steels,et al.  The synthetic modeling of language origins , 1997 .

[12]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[13]  Angeliki Lazaridou,et al.  Emergent Multi-Agent Communication in the Deep Learning Era , 2020, ArXiv.

[14]  Okko Johannes Räsänen,et al.  An online model for vowel imitation learning , 2017, Speech Commun..

[15]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[16]  Simon Kirby,et al.  Spontaneous evolution of linguistic structure-an iterated learning model of the emergence of regularity and irregularity , 2001, IEEE Trans. Evol. Comput..

[17]  L. Steels,et al.  coordinating perceptually grounded categories through language: a case study for colour , 2005, Behavioral and Brain Sciences.

[18]  Pierre-Yves Oudeyer,et al.  The Self-Organization of Speech Sounds , 2005, Journal of theoretical biology.

[19]  Minoru Asada,et al.  Modeling Early Vocal Development Through Infant–Caregiver Interaction: A Review , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[20]  Emmanuel Chemla,et al.  On the Spontaneous Emergence of Discrete and Compositional Signals , 2020, ACL.

[21]  Pierre-Yves Oudeyer,et al.  Multi-Agent Reinforcement Learning as a Computational Tool for Language Evolution Research: Historical Context and Future Challenges , 2020, ArXiv.

[22]  Doina Precup,et al.  Learning to cooperate: Emergent communication in multi-agent navigation , 2020, CogSci.

[23]  Clément Moulin-Frier,et al.  COSMO ("Communicating about Objects using Sensory-Motor Operations"): A Bayesian modeling framework for studying speech communication and the emergence of phonological systems , 2015, J. Phonetics.

[24]  P. Kuhl Early language acquisition: cracking the speech code , 2004, Nature Reviews Neuroscience.