Communicating via Markov Decision Processes

We consider the problem of communicating exogenous information by means of Markov decision process trajectories. This setting, which we call a Markov coding game (MCG), generalizes both source coding and a large class of referential games. MCGs also isolate a problem that is important in decentralized control settings in which cheap-talk is not available—namely, they require balancing communication with the associated cost of communicating. We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME. Due to recent breakthroughs in approximation algorithms for minimum entropy coupling, MEME is not merely a theoretical algorithm, but can be applied to prac-tical settings. Empirically, we show both that MEME is able to outperform a strong baseline on small MCGs and that MEME is able to achieve strong performance on extremely large MCGs. To the latter point, we demonstrate that MEME is able to losslessly communicate binary images via trajectories of Cartpole and Pong, while simultaneously achieving the maximal or near maximal expected returns, and that it is even capable of performing well in the presence of actuator noise.

[1]  Hangyu Mao,et al.  Learning multi-agent communication with double attentional deep reinforcement learning , 2020, Autonomous Agents and Multi-Agent Systems.

[2]  Zhen Xiao,et al.  Learning Agent Communication under Limited Bandwidth by Message Pruning , 2019, AAAI.

[3]  Bo An,et al.  Learning Efficient Multi-agent Communication: An Information Bottleneck Approach , 2019, ICML.

[4]  Joshua B. Tenenbaum,et al.  Learning to Share and Hide Intentions using Information Regularization , 2018, NeurIPS.

[5]  Stephen Clark,et al.  Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input , 2018, ICLR.

[6]  Simon Kirby,et al.  Minimal Requirements for the Emergence of Learned Signaling , 2014, Cogn. Sci..

[7]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[8]  Mladen Kovacevic,et al.  On the entropy of couplings , 2013, Inf. Comput..

[9]  Armin W. Schulz Signals: evolution, learning, and information , 2012 .

[10]  Luc Steels,et al.  Introduction. Self-organization and selection in cultural language evolution , 2012 .

[11]  Sergio Verdu,et al.  Fixed-length lossy compression in the finite blocklength regime: Gaussian source , 2011, ITW.

[12]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[13]  Sekhar Tatikonda,et al.  The Capacity of Channels With Feedback , 2006, IEEE Transactions on Information Theory.

[14]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[15]  Toby Berger,et al.  The capacity of finite-State Markov Channels With feedback , 2005, IEEE Transactions on Information Theory.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[18]  Kenny Smith,et al.  The cultural evolution of communication in a population of neural networks , 2002, Connect. Sci..

[19]  B. P. Lathi Modern Digital and Analog Communication Systems 3e Osece , 1998 .

[20]  Masakatu Morii,et al.  An Efficient Universal Coding Algorithm for Noiseless Channel with Symbols of Unequal Cost , 1997 .

[21]  Hong Shen Wang,et al.  Finite-state Markov channel-a useful model for radio communication channels , 1995 .

[22]  Lawrence L. Larmore,et al.  Length-limited coding , 1990, SODA '90.

[23]  R. Kirk CONVENTION: A PHILOSOPHICAL STUDY , 1970 .