Decentralization of Multiagent Policies by Learning What to Communicate

Effective communication is required for teams of robots to solve sophisticated collaborative tasks. In practice it is typical for both the encoding and semantics of communication to be manually defined by an expert; this is true regardless of whether the behaviors themselves are bespoke, optimization based, or learned. We present an agent architecture and training methodology using neural networks to learn task-oriented communication semantics based on the example of a communication-unaware expert policy. A perimeter defense game illustrates the system’s ability to handle dynamically changing numbers of agents and its graceful degradation in performance as communication constraints are tightened or the expert’s observability assumptions are broken.

[1]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[2]  Vijay Kumar,et al.  Local-game Decomposition for Multiplayer Perimeter-defense Problem , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[3]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Raffaello D'Andrea,et al.  Guest editorial: A revolution in the warehouse: a retrospective on Kiva Systems and the grand challenges ahead , 2012, IEEE Trans Autom. Sci. Eng..

[5]  Jonathan P. How,et al.  Decentralized control of partially observable Markov decision processes , 2015, 52nd IEEE Conference on Decision and Control.

[6]  Bikramjit Banerjee,et al.  Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.

[7]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[8]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[9]  Risto Miikkulainen,et al.  Multiagent Learning through Neuroevolution , 2012, WCCI.

[10]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[11]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[12]  Morten Bisgaard,et al.  Adaptive Surveying and Early Treatment of Crops with a Team of Autonomous Vehicles , 2011, ECMR.

[13]  Antonio G. Marques,et al.  Convolutional Neural Network Architectures for Signals Supported on Graphs , 2018, IEEE Transactions on Signal Processing.

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[16]  Vijay Kumar,et al.  Distributed Search and Rescue with Robot and Sensor Teams , 2003, FSR.

[17]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[18]  Andrew Howard,et al.  Multi-robot Simultaneous Localization and Mapping using Particle Filters , 2005, Int. J. Robotics Res..

[19]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[20]  Michael Trentini,et al.  Multiple‐Robot Simultaneous Localization and Mapping: A Review , 2016, J. Field Robotics.

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[22]  Goldie Nejat,et al.  Multirobot Cooperative Learning for Semiautonomous Control in Urban Search and Rescue Applications , 2016, J. Field Robotics.

[23]  Barnabás Póczos,et al.  Deep Learning with Sets and Point Clouds , 2016, ICLR.

[24]  David Fridovich-Keil,et al.  Fully Decentralized Policies for Multi-Agent Systems: An Information Theoretic Approach , 2017, NIPS.

[25]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[26]  Shimon Whiteson,et al.  Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks , 2016, ArXiv.

[27]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[28]  Antonio Barrientos,et al.  Aerial remote sensing in agriculture: A practical approach to area coverage and path planning for fleets of mini aerial robots , 2011, J. Field Robotics.

[29]  Barnabás Póczos,et al.  Equivariance Through Parameter-Sharing , 2017, ICML.