Entropy Minimization In Emergent Languages

There is growing interest in studying the languages that emerge when neural agents are jointly trained to solve tasks requiring communication through a discrete channel. We investigate here the information-theoretic complexity of such languages, focusing on the basic two-agent, one-exchange setup. We find that, under common training procedures, the emergent languages are subject to an entropy minimization pressure that has also been detected in human language, whereby the mutual information between the communicating agent's inputs and the messages is minimized, within the range afforded by the need for successful communication. That is, emergent languages are (nearly) as simple as the task they are developed for allow them to be. This pressure is amplified as we increase communication channel discreteness. Further, we observe that stronger discrete-channel-driven entropy minimization leads to representations with increased robustness to overfitting and adversarial attacks. We conclude by discussing the implications of our findings for the study of natural and artificial communication systems.

[1]  N. Masataka The Origins of Language , 2008 .

[2]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[3]  Joelle Pineau,et al.  On the Pitfalls of Measuring Emergent Communication , 2019, AAMAS.

[4]  David J. Schwab,et al.  The Deterministic Information Bottleneck , 2015, Neural Computation.

[5]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[6]  Devdatt P. Dubhashi,et al.  DeepColor: Reinforcement Learning optimizes information efficiency and well-formedness in color name partitioning , 2018, CogSci.

[7]  Stephen Clark,et al.  Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input , 2018, ICLR.

[8]  Ivan Titov,et al.  Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols , 2017, NIPS.

[9]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[10]  Nando de Freitas,et al.  Compositional Obverter Communication Learning From Raw Visual Input , 2018, ICLR.

[11]  C. Daly Why Only Us? Language and Evolution , 2018 .

[12]  David Lusseau,et al.  Compression as a Universal Principle of Animal Behavior , 2013, Cogn. Sci..

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Jason Tyler Rolfe,et al.  Discrete Variational Autoencoders , 2016, ICLR.

[15]  Charles Kemp,et al.  Efficient compression in color naming and its evolution , 2018, Proceedings of the National Academy of Sciences.

[16]  Kyunghyun Cho,et al.  Emergent Communication in a Multi-Modal, Multi-Step Referential Game , 2017, ICLR.

[17]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[18]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[19]  Yoshua Bengio,et al.  BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop , 2018, ArXiv.

[20]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[21]  Bruno Galantucci,et al.  Experimental Semiotics: A New Approach for Studying Communication as a Form of Joint Action , 2009, Top. Cogn. Sci..

[22]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  D. Bickerton More Than Nature Needs: Language, Mind, and Evolution , 2014 .

[24]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[25]  David A. Wagner,et al.  Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[26]  José M. F. Moura,et al.  Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[27]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[28]  Eugene Kharitonov,et al.  Anti-efficient encoding in emergent communication , 2019, NeurIPS.

[29]  R. F. Cancho,et al.  The global minima of the communicative energy of natural communication systems , 2007 .

[30]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[31]  Jing Peng,et al.  Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .

[32]  Charles Kemp,et al.  Kinship Categories Across Languages Reflect General Communicative Principles , 2012, Science.

[33]  Naftali Tishby,et al.  Semantic categories of artifacts and animals reflect efficient coding , 2019, SCIL.

[34]  Laura Graesser,et al.  Emergent Linguistic Phenomena in Multi-Agent Communication Games , 2019, EMNLP.

[35]  Simon Kirby,et al.  Natural Language From Artificial Life , 2002, Artificial Life.

[36]  Eugene Kharitonov,et al.  EGG: a toolkit for research on Emergence of lanGuage in Games , 2019, EMNLP.

[37]  Ian S. Fischer,et al.  The Conditional Entropy Bottleneck , 2020, Entropy.

[38]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[39]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[40]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[41]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[42]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[43]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[44]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[45]  Marco Baroni,et al.  How agents see things: On visual representations in an emergent language game , 2018, EMNLP.

[46]  Pieter Abbeel,et al.  Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[47]  E. Gibson,et al.  How Efficiency Shapes Human Language , 2019, Trends in Cognitive Sciences.

[48]  Michael Bowling,et al.  Ease-of-Teaching and Language Structure from Emergent Communication , 2019, NeurIPS.

[49]  M. E. Medina-Callarotti Origins of Language , 2000 .

[50]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[51]  Eugene Kharitonov,et al.  Compositionality and Generalization In Emergent Languages , 2020, ACL.

[52]  Balthasar Bickel,et al.  Language evolution: syntax before phonology? , 2014, Proceedings of the Royal Society B: Biological Sciences.

[53]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[54]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[55]  Tomas Mikolov,et al.  A Roadmap Towards Machine Intelligence , 2015, CICLing.