Neurosymbolic Transformers for Multi-Agent Communication

We study the problem of inferring communication structures that can solve cooperative multi-agent planning problems while minimizing the amount of communication. We quantify the amount of communication as the maximum degree of the communication graph; this metric captures settings where agents have limited bandwidth. Minimizing communication is challenging due to the combinatorial nature of both the decision space and the objective; for instance, we cannot solve this problem by training neural networks using gradient descent. We propose a novel algorithm that synthesizes a control policy that combines a programmatic communication policy used to generate the communication graph with a transformer policy network used to choose actions. Our algorithm first trains the transformer policy, which implicitly generates a “soft” communication graph; then, it synthesizes a programmatic communication policy that “hardens” this graph, forming a neurosymbolic transformer. Our experiments demonstrate how our approach can synthesize policies that generate low-degree communication graphs while maintaining near-optimal performance.

[1]  Mayur Naik,et al.  Learning Neurosymbolic Generative Models via Program Synthesis , 2019, ICML.

[2]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[3]  Armando Solar-Lezama,et al.  Synthesizing Programmatic Policies that Inductively Generalize , 2020, ICLR.

[4]  Abhinav Verma,et al.  Imitation-Projected Programmatic Reinforcement Learning , 2019, NeurIPS.

[5]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[6]  Daigo Shishika,et al.  Cooperative Team Strategies for Multi-Player Perimeter-Defense Games , 2019, IEEE Robotics and Automation Letters.

[7]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[8]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[9]  Armando Solar-Lezama,et al.  Unsupervised Learning by Program Synthesis , 2015, NIPS.

[10]  Osbert Bastani,et al.  Polytopic Trees for Verification of Learning-Based Controllers , 2019, NSV@CAV.

[11]  Jiajun Wu,et al.  Learning to Describe Scenes with Programs , 2018, ICLR.

[12]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[13]  Sergey Levine,et al.  Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.

[14]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[15]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[16]  Amanpreet Singh,et al.  Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks , 2018, ICLR.

[17]  Vijay Kumar,et al.  Graph Policy Gradients for Large Scale Robot Control , 2019, CoRL.

[18]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[19]  Swarat Chaudhuri,et al.  HOUDINI: Lifelong Learning as Program Synthesis , 2018, NeurIPS.

[20]  Osbert Bastani,et al.  Sample Complexity of Estimating the Policy Gradient for Nearly Deterministic Dynamical Systems , 2019, AISTATS.

[21]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[22]  Leslie Pack Kaelbling,et al.  Selecting Representative Examples for Program Synthesis , 2017, ICML.

[23]  Vijay Kumar,et al.  Decentralization of Multiagent Policies by Learning What to Communicate , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[24]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[25]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[26]  Armando Solar-Lezama,et al.  Verifiable Reinforcement Learning via Policy Extraction , 2018, NeurIPS.

[27]  Vijay Kumar,et al.  Learning Decentralized Controllers for Robot Swarms with Graph Neural Networks , 2019, CoRL.

[28]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[29]  Joelle Pineau,et al.  TarMAC: Targeted Multi-Agent Communication , 2018, ICML.

[30]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[31]  Armando Solar-Lezama,et al.  Learning to Infer Graphics Programs from Hand-Drawn Images , 2017, NeurIPS.

[32]  Alexander Aiken,et al.  Stochastic superoptimization , 2012, ASPLOS '13.

[33]  Abhinav Verma,et al.  Programmatically Interpretable Reinforcement Learning , 2018, ICML.

[34]  Vijay Kumar,et al.  Learning Safe Unlabeled Multi-Robot Planning with Motion Constraints , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).