Revisiting the Master-Slave Architecture in Multi-Agent Deep Reinforcement Learning

Many tasks in artificial intelligence require the collaboration of multiple agents. We exam deep reinforcement learning for multi-agent domains. Recent research efforts often take the form of two seemingly conflicting perspectives, the decentralized perspective, where each agent is supposed to have its own controller; and the centralized perspective, where one assumes there is a larger model controlling all agents. In this regard, we revisit the idea of the master-slave architecture by incorporating both perspectives within one framework. Such a hierarchical structure naturally leverages advantages from one another. The idea of combining both perspective is intuitive and can be well motivated from many real world systems, however, out of a variety of possible realizations, we highlights three key ingredients, i.e. composed action representation, learnable communication and independent reasoning. With network designs to facilitate these explicitly, our proposal consistently outperforms latest competing methods both in synthetics experiments and when applied to challenging StarCraft micromanagement tasks.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Nicolas Usunier,et al.  Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[3]  Florian Richoux,et al.  TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games , 2016, ArXiv.

[4]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  Wenwu Yu,et al.  An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination , 2012, IEEE Transactions on Industrial Informatics.

[6]  Laurent Jeanpierre,et al.  Coordinated Multi-Robot Exploration Under Communication Constraints Using Decentralized Markov Decision Processes , 2012, AAAI.

[7]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[8]  Erfu Yang,et al.  Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey , 2004 .

[9]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[10]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[11]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[12]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[13]  D. B. Megherbi,et al.  A hybrid P2P and master-slave cooperative distributed multi-agent reinforcement learning technique with asynchronously triggered exploratory trials and clutter-index-based selected sub-goals , 2016, 2016 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA).

[14]  Rob Fergus,et al.  MazeBase: A Sandbox for Learning from Games , 2015, ArXiv.

[15]  Jong-Hwan Kim,et al.  Modular Q-learning based multi-agent cooperation for robot soccer , 2001, Robotics Auton. Syst..

[16]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[17]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[18]  Yun Yang,et al.  A Multi-Agent Framework for Packet Routing in Wireless Sensor Networks , 2015, Sensors.

[19]  Xiangyu Liu,et al.  ACCNet: Actor-Coordinator-Critic Net for "Learning-to-Communicate" with Deep Multi-agent Reinforcement Learning , 2017, ArXiv.

[20]  Ann Nowé,et al.  Coordinated exploration in multi-agent reinforcement learning: an application to load-balancing , 2005, AAMAS '05.

[21]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[22]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[23]  D. B. Megherbi,et al.  A hybrid P2P and master-slave architecture for intelligent multi-agent reinforcement learning in a distributed computing environment: A case study , 2010, 2010 IEEE International Conference on Computational Intelligence for Measurement Systems and Applications.

[24]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[25]  Gang Hua,et al.  Collaborative Deep Reinforcement Learning for Joint Object Search , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[27]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[28]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[29]  Ning Zhang,et al.  Deep Reinforcement Learning-Based Image Captioning with Embedding Reward , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.