Learning to Gather without Communication

A standard belief on emerging collective behavior is that it emerges from simple individual rules. Most of the mathematical research on such collective behavior starts from imperative individual rules, like always go to the center. But how could an (optimal) individual rule emerge during a short period within the group lifetime, especially if communication is not available. We argue that such rules can actually emerge in a group in a short span of time via collective (multi-agent) reinforcement learning, i.e learning via rewards and punishments. We consider the gathering problem: several agents (social animals, swarming robots...) must gather around a same position, which is not determined in advance. They must do so without communication on their planned decision, just by looking at the position of other agents. We present the first experimental evidence that a gathering behavior can be learned without communication in a partially observable environment. The learned behavior has the same properties as a self-stabilizing distributed algorithm, as processes can gather from any initial state (and thus tolerate any transient failure). Besides, we show that it is possible to tolerate the brutal loss of up to 90\% of agents without significant impact on the behavior.

[1]  T. Horiuchi,et al.  Fuzzy interpolation-based Q-learning with profit sharing plan scheme , 1997, Proceedings of 6th International Fuzzy Systems Conference.

[2]  N. Matsui,et al.  Characteristics of Flocking Behavior Model by Reinforcement Learning Scheme , 2006, 2006 SICE-ICASE International Joint Conference.

[3]  Stephen F. Smith,et al.  Multiagent social learning in large repeated games , 2009 .

[4]  Edward A. Lee The problem with threads , 2006, Computer.

[5]  Eliseo Ferrante,et al.  Swarm robotics: a review from the swarm engineering perspective , 2013, Swarm Intelligence.

[6]  Nicola Santoro,et al.  Solving the Robots Gathering Problem , 2003, ICALP.

[7]  David J. Finton,et al.  When do differences matter? On-line feature extraction through cognitive economy , 2004, Cognitive Systems Research.

[8]  H. Vincent Poor,et al.  QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations , 2012, IEEE Trans. Signal Process..

[9]  Reda Alhajj,et al.  Multiagent reinforcement learning using function approximation , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[10]  Katia P. Sycara,et al.  Evolutionary Search, Stochastic Policies with Memory, and Reinforcement Learning with Hidden State , 2001, ICML.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Weihua Sheng,et al.  Multirobot Cooperative Learning for Predator Avoidance , 2015, IEEE Transactions on Control Systems Technology.

[13]  Craig W. Reynolds Flocks, herds, and schools: a distributed behavioral model , 1998 .

[14]  Xavier Pennec,et al.  Probabilities and statistics on Riemannian manifolds: Basic tools for geometric measurements , 1999, NSIP.

[15]  Qiao Zhang,et al.  The improved Q-Learning algorithm based on pheromone mechanism for swarm robot system , 2013, Proceedings of the 32nd Chinese Control Conference.

[16]  Tamer A. ElBatt,et al.  Cooperative Q-learning techniques for distributed online power allocation in femtocell networks , 2015, Wirel. Commun. Mob. Comput..

[17]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[18]  Andrzej Pelc,et al.  Gathering few fat mobile robots in the plane , 2009, Theor. Comput. Sci..

[19]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[20]  Liam MacDermed,et al.  Scaling Up Game Theory: Achievable Set Methods for Efficiently Solving Stochastic Games of Complete and Incomplete Information , 2011, AAAI.

[21]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[22]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[23]  Yang Gao,et al.  A two-layered multi-agent reinforcement learning model and algorithm , 2007, J. Netw. Comput. Appl..

[24]  Junwei Gao,et al.  FMRQ—A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks , 2017, IEEE Transactions on Cybernetics.

[25]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[26]  Herbert A. Simon,et al.  WHY SHOULD MACHINES LEARN , 1983 .

[27]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[28]  Victor R. Lesser,et al.  Communication in multi-agent Markov decision processes , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[29]  Ali Selamat,et al.  Modeling of route planning system based on Q value-based dynamic programming with multi-agent reinforcement learning algorithms , 2014, Eng. Appl. Artif. Intell..

[30]  Rob A. Zuidwijk,et al.  Can agents measure up? A comparative study of an agent-based and on-line optimization approach for a drayage problem with uncertainty , 2010 .

[31]  Howard M. Schwartz,et al.  Exponential moving average based multiagent reinforcement learning algorithms , 2016, Artificial Intelligence Review.

[32]  Olivier Buffet,et al.  Shaping multi-agent systems with gradient reinforcement learning , 2007, Autonomous Agents and Multi-Agent Systems.

[33]  John Loch,et al.  Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[34]  Shlomi Dolev,et al.  Self Stabilization , 2004, J. Aerosp. Comput. Inf. Commun..

[35]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[36]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[37]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[38]  Bruno Bouzy,et al.  Multi-agent Learning Experiments on Repeated Matrix Games , 2010, ICML.

[39]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[40]  Noa Agmon,et al.  Fault-tolerant gathering algorithms for autonomous mobile robots , 2004, SODA '04.

[41]  Ronald C. Arkin,et al.  Cooperation without communication: Multiagent schema-based robot navigation , 1992, J. Field Robotics.

[42]  B. Charlier Necessary and sufficient condition for the existence of a Fréchet mean on the circle , 2013 .

[43]  Masafumi Yamashita,et al.  Distributed memoryless point convergence algorithm for mobile robots with limited visibility , 1999, IEEE Trans. Robotics Autom..

[44]  Shalabh Bhatnagar,et al.  Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games , 2015, AAMAS.

[45]  Daniel W. Engels,et al.  HiQ: a hierarchical Q-learning algorithm to solve the reader collision problem , 2006, International Symposium on Applications and the Internet Workshops (SAINTW'06).