Deep Reinforcement Learning for Swarm Systems

Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, the observation vector for decentralized decision making is represented by a concatenation of the (local) information an agent gathers about other agents. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions, where we treat the agents as samples and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and neural networks trained end-to-end. We evaluate the representation on two well-known problems from the swarm literature in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents, facilitating the development of complex collective strategies.

[1]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[2]  Levent Bayındır,et al.  A review of swarm robotics tasks , 2016, Neurocomputing.

[3]  Mo Chen,et al.  A path defense approach to the multiplayer reach-avoid game , 2014, 53rd IEEE Conference on Decision and Control.

[4]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[5]  Sean Luke,et al.  Lenient learners in cooperative multiagent systems , 2006, AAMAS '06.

[6]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[7]  Mo Chen,et al.  Multiplayer reach-avoid games via low dimensional solutions and maximum matching , 2014, 2014 American Control Conference.

[8]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[9]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[10]  Marius Schnaubelt,et al.  Learning Robust Policies for Object Manipulation with Robot Swarms , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Gerhard Neumann,et al.  Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning , 2017, ANTS Conference.

[12]  M. Stanković Multi-agent reinforcement learning , 2016 .

[13]  Rahul Savani,et al.  Lenient Multi-Agent Deep Reinforcement Learning , 2017, AAMAS.

[14]  Mireille E. Broucke,et al.  Local control strategies for groups of mobile autonomous agents , 2004, IEEE Transactions on Automatic Control.

[15]  Dimos V. Dimarogonas,et al.  On the Rendezvous Problem for Multiple Nonholonomic Agents , 2007, IEEE Transactions on Automatic Control.

[16]  Magnus Egerstedt,et al.  Distributed Coordination Control of Multiagent Systems While Preserving Connectedness , 2007, IEEE Transactions on Robotics.

[17]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[18]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[19]  Guillaume J. Laurent,et al.  Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Julia Handl,et al.  Ant-based and swarm-based clustering , 2007, Swarm Intelligence.

[21]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[22]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[23]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[24]  Mo Chen,et al.  Multiplayer Reach-Avoid Games via Pairwise Outcomes , 2016, IEEE Transactions on Automatic Control.

[25]  Brian D. O. Anderson,et al.  The Multi-Agent Rendezvous Problem. Part 2: The Asynchronous Case , 2007, SIAM J. Control. Optim..

[26]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent RL under Partial Observability , 2017 .

[27]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[28]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[29]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[30]  Weinan Zhang,et al.  MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence , 2017, AAAI.

[31]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[32]  Zhengyuan Zhou,et al.  A general, open-loop formulation for reach-avoid games , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[33]  Heinz Koeppl,et al.  Inverse Reinforcement Learning in Swarm Systems , 2016, AAMAS.

[34]  Ali Jadbabaie,et al.  Decentralized Control of Connectivity for Multi-Agent Systems , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[35]  Jie Lin,et al.  Coordination of groups of mobile autonomous agents using nearest neighbor rules , 2003, IEEE Trans. Autom. Control..

[36]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[37]  Maruan Al-Shedivat,et al.  Learning Policy Representations in Multiagent Systems , 2018, ICML.

[38]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[39]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[40]  Zhengyuan Zhou,et al.  Cooperative pursuit with Voronoi partitions , 2016, Autom..

[41]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[42]  Xiaoming Hu,et al.  Formation constrained multi-agent control , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[43]  Geoffrey A. Hollinger,et al.  Search and pursuit-evasion in mobile robotics , 2011, Auton. Robots.

[44]  Muddassar Farooq,et al.  Swarm intelligence based routing protocol for wireless sensor networks: Survey and future directions , 2011, Inf. Sci..