Domain-Aware Multiagent Reinforcement Learning in Navigation

Multiagent reinforcement learning has shown success in guiding the agents' behaviour in systems that have realworld significance. In these frameworks, agents learn how to interact with the environment and other agents while satisfying their objectives. Unfortunately, the level of complexity of realworld problems requires a significant investment of computational resources before multiagent reinforcement learning methods are able to deliver results. However, by incorporating a priori domain knowledge, more computationally-efficient algorithms can be developed. In this paper, for the first time, we present a Domain-Aware Multiagent Actor-Critic (DAMAC) algorithm, which integrates domain knowledge with the centralised learning and decentralised execution multiagent reinforcement learning approach using domain-specific solvers. Our experiments show that our algorithm achieves substantial high reward and reduces the training time by two orders of magnitude as compared to other multiagent reinforcement learning algorithms. This enables the adoption of this powerful framework in more resource-constrained scenarios.

[1]  Satinder Singh,et al.  On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.

[2]  Furong Huang,et al.  Can Agents Learn by Analogy? An Inferable Model for PAC Reinforcement Learning , 2020, AAMAS.

[3]  Felipe Leno da Silva,et al.  A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems , 2019, J. Artif. Intell. Res..

[4]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[5]  Fei Sha,et al.  Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.

[6]  Jia Shi,et al.  Model Predictive Control Guided Reinforcement Learning Control Scheme , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[7]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[8]  Timothy Verstraeten,et al.  Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping , 2020, ArXiv.

[9]  Tamer Basar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[10]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[11]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[12]  Peter Sanders,et al.  Engineering Route Planning Algorithms , 2009, Algorithmics of Large and Complex Networks.

[13]  Pieter Abbeel,et al.  Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.

[14]  Victor Lesser,et al.  ROMA: Multi-Agent Reinforcement Learning with Emergent Roles , 2020, ICML.

[15]  De-Chuan Zhan,et al.  Automatic Successive Reinforcement Learning with Multiple Auxiliary Rewards , 2019, IJCAI.

[16]  Fabien Michel,et al.  Input Addition and Deletion in Reinforcement: Towards Learning with Structural Changes , 2020, AAMAS.

[17]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[18]  Daqiang Zhang,et al.  Towards smart factory for industry 4.0: a self-organized multi-agent system with big data based feedback and coordination , 2016, Comput. Networks.

[19]  Christian Bauckhage,et al.  Leveraging Domain Knowledge for Reinforcement Learning Using MMC Architectures , 2019, ICANN.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[22]  Drew Wicke,et al.  Multiagent Soft Q-Learning , 2018, AAAI Spring Symposia.

[23]  Wenlong Fu,et al.  Model-based reinforcement learning: A survey , 2018 .

[24]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[25]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[26]  Junhyuk Oh,et al.  What Can Learned Intrinsic Rewards Capture? , 2019, ICML.

[28]  Shou-De Lin,et al.  Designing Non-greedy Reinforcement Learning Agents with Diminishing Reward Shaping , 2018, AIES.

[29]  Keeheon Lee,et al.  The Computational Limits of Deep Learning , 2020, ArXiv.