Multi-Agent Reinforcement Learning by the Actor-Critic Model with an Attention Interface

Abstract Multi-agent reinforcement learning algorithms have achieved satisfactory performances in various scenarios, but many of them encounter difficulties in partially observable environments. In partially observable environments, the inability to perceive environment states results in unsteadiness and misconvergence, especially in large-scale multi-agent environments. To improve interactions among homogeneous agents in a partially observable environment, we propose a novel multi-agent actor-critic model with a visual attention interface to solve this problem. First, a recurrent visual attention interface is used to extract a latent state from each agent’s partial observation. These latent states allow agents to focus on several local environments, in which each agent has a complete perception of a local environment and the intricate multi-agent environment is teased out by the interaction among several agents in the same local environment. The proposed method trains multi-agent systems with a centralized training and decentralized execution mechanism. The joint action of agents is approximated by the mean-field theory because the number of agents in a local environment is uncertain. Experimental results on the simulation platform suggest that our model performs better when training large-scale multi-agent systems in partially observable environments than baselines.

[1]  Hyo-Sung Ahn,et al.  A survey on multi-agent reinforcement learning: Coordination problems , 2010, Proceedings of 2010 IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications.

[2]  Ana Paula Appel,et al.  Large-Scale Multi-agent-Based Modeling and Simulation of Microblogging-Based Online Social Network , 2013, MABS.

[3]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[4]  Sonia Martínez,et al.  An approximate dual subgradient algorithm for multi-agent non-convex optimization , 2010, 49th IEEE Conference on Decision and Control (CDC).

[5]  Lihua Xie,et al.  Multi-Agent Cooperative Target Search , 2014, Sensors.

[6]  Kao-Shing Hwang,et al.  Decoupled Visual Servoing With Fuzzy Q-Learning , 2018, IEEE Transactions on Industrial Informatics.

[7]  Joelle Pineau,et al.  TarMAC: Targeted Multi-Agent Communication , 2018, ICML.

[8]  Weinan Zhang,et al.  MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence , 2017, AAAI.

[9]  Kao-Shing Hwang,et al.  An adaptive decision-making method with fuzzy Bayesian reinforcement learning for robot soccer , 2018, Inf. Sci..

[10]  Jianfei Cai,et al.  Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[12]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Wei Zhang,et al.  Multiagent-Based Reinforcement Learning for Optimal Reactive Power Dispatch , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  N. M. Reus,et al.  Two approaches to developing a multi-agent system for battle command simulation , 2013, 2013 Winter Simulations Conference (WSC).

[15]  Richard Socher,et al.  Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[17]  Gang Chen,et al.  Multi-agent Q-learning with joint state value approximation , 2011, Proceedings of the 30th Chinese Control Conference.

[18]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[19]  Olivier Simonin,et al.  Cooperative Multi-agent Policy Gradient , 2018, ECML/PKDD.

[20]  Xiangjun Li,et al.  Coordinated Control and Energy Management Strategies for Hundred Megawatt-level Battery Energy Storage Stations Based on Multi-agent Theory , 2018, 2018 International Conference on Advanced Mechatronic Systems (ICAMechS).

[21]  Huaguang Zhang,et al.  Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown Dynamics Using Reinforcement Learning Method , 2017, IEEE Transactions on Industrial Electronics.

[22]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[23]  Aleksandr Kapitonov,et al.  Blockchain-based protocol of autonomous business activity for multi-agent systems consisting of UAVs , 2017, 2017 Workshop on Research, Education and Development of Unmanned Aerial Systems (RED-UAS).

[24]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[25]  Xiu-Shen Wei,et al.  Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization , 2018, Pattern Recognit..

[26]  Jun Wang,et al.  Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning , 2019, WWW.