论文信息 - Neurosymbolic Transformers for Multi-Agent Communication - 字舞流文

Neurosymbolic Transformers for Multi-Agent Communication

We study the problem of inferring communication structures that can solve cooperative multi-agent planning problems while minimizing the amount of communication. We quantify the amount of communication as the maximum degree of the communication graph; this metric captures settings where agents have limited bandwidth. Minimizing communication is challenging due to the combinatorial nature of both the decision space and the objective; for instance, we cannot solve this problem by training neural networks using gradient descent. We propose a novel algorithm that synthesizes a control policy that combines a programmatic communication policy used to generate the communication graph with a transformer policy network used to choose actions. Our algorithm first trains the transformer policy, which implicitly generates a “soft” communication graph; then, it synthesizes a programmatic communication policy that “hardens” this graph, forming a neurosymbolic transformer. Our experiments demonstrate how our approach can synthesize policies that generate low-degree communication graphs while maintaining near-optimal performance.

Vijay Kumar | Armando Solar-Lezama | Martin C. Rinard | Osbert Bastani | Yewen Pu | Jeevana Priya Inala | James Paulos | Yichen Yang | Vijay R. Kumar | M. Rinard | Armando Solar-Lezama | J. Inala | James Paulos | Yewen Pu | Yichen Yang | O. Bastani

[1] Mayur Naik,et al. Learning Neurosymbolic Generative Models via Program Synthesis , 2019, ICML.

[2] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[3] Armando Solar-Lezama,et al. Synthesizing Programmatic Policies that Inductively Generalize , 2020, ICLR.

[4] Abhinav Verma,et al. Imitation-Projected Programmatic Reinforcement Learning , 2019, NeurIPS.

[5] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[6] Daigo Shishika,et al. Cooperative Team Strategies for Multi-Player Perimeter-Defense Games , 2019, IEEE Robotics and Automation Letters.

[7] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[8] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[9] Armando Solar-Lezama,et al. Unsupervised Learning by Program Synthesis , 2015, NIPS.

[10] Osbert Bastani,et al. Polytopic Trees for Verification of Learning-Based Controllers , 2019, NSV@CAV.

[11] Jiajun Wu,et al. Learning to Describe Scenes with Programs , 2018, ICLR.

[12] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[13] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.

[14] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[15] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[16] Amanpreet Singh,et al. Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks , 2018, ICLR.

[17] Vijay Kumar,et al. Graph Policy Gradients for Large Scale Robot Control , 2019, CoRL.

[18] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[19] Swarat Chaudhuri,et al. HOUDINI: Lifelong Learning as Program Synthesis , 2018, NeurIPS.

[20] Osbert Bastani,et al. Sample Complexity of Estimating the Policy Gradient for Nearly Deterministic Dynamical Systems , 2019, AISTATS.

[21] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[22] Leslie Pack Kaelbling,et al. Selecting Representative Examples for Program Synthesis , 2017, ICML.

[23] Vijay Kumar,et al. Decentralization of Multiagent Policies by Learning What to Communicate , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[24] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[25] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[26] Armando Solar-Lezama,et al. Verifiable Reinforcement Learning via Policy Extraction , 2018, NeurIPS.

[27] Vijay Kumar,et al. Learning Decentralized Controllers for Robot Swarms with Graph Neural Networks , 2019, CoRL.

[28] Cynthia Rudin,et al. Falling Rule Lists , 2014, AISTATS.

[29] Joelle Pineau,et al. TarMAC: Targeted Multi-Agent Communication , 2018, ICML.

[30] Sergey Levine,et al. Guided Policy Search , 2013, ICML.

[31] Armando Solar-Lezama,et al. Learning to Infer Graphics Programs from Hand-Drawn Images , 2017, NeurIPS.

[32] Alexander Aiken,et al. Stochastic superoptimization , 2012, ASPLOS '13.

[33] Abhinav Verma,et al. Programmatically Interpretable Reinforcement Learning , 2018, ICML.

[34] Vijay Kumar,et al. Learning Safe Unlabeled Multi-Robot Planning with Motion Constraints , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).