D++: Structural credit assignment in tightly coupled multiagent domains

Autonomous multi-robot teams can be used in complex coordinated exploration tasks to improve exploration performance in terms of both speed and effectiveness. However, use of multi-robot systems presents additional challenges. Specifically, in domains where the robots' actions are tightly coupled, coordinating multiple robots to achieve cooperative behavior at the group level is difficult. In this paper, we demonstrate that reward shaping can greatly benefit learning in multi-robot exploration tasks. We propose a novel reward framework based on the idea of counterfactuals to tackle the coordination problem in tightly coupled domains. We show that the proposed algorithm provides superior performance (166% performance improvement and a quadruple convergence speed up) compared to policies learned using either the global reward or the difference reward [1].

[1]  Jijun Wang Cooperating Robots for Search and Rescue , 2006 .

[2]  Stéphane Doncieux,et al.  Encouraging Behavioral Diversity in Evolutionary Robotics: An Empirical Study , 2012, Evolutionary Computation.

[3]  Peter Stone,et al.  A Neural Network-Based Approach to Robot Motion Control , 2007, RoboCup.

[4]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[5]  David C. Parkes,et al.  An MDP-Based Approach to Online Mechanism Design , 2003, NIPS.

[6]  Kenneth A. De Jong,et al.  A Cooperative Coevolutionary Approach to Function Optimization , 1994, PPSN.

[7]  Evelina Lamma,et al.  Belief Revision by Multi-Agent Genetic Search , 2001 .

[8]  Kagan Tumer,et al.  Fitness function shaping in multiagent cooperative coevolutionary algorithms , 2017, Autonomous Agents and Multi-Agent Systems.

[9]  Kagan Tumer,et al.  Multi-agent reward analysis for learning in noisy domains , 2005, AAMAS '05.

[10]  Kagan Tumer,et al.  Using Collective Intelligence to Route Internet Traffic , 1998, NIPS.

[11]  Sarit Kraus,et al.  Multi-robot perimeter patrol in adversarial settings , 2008, 2008 IEEE International Conference on Robotics and Automation.

[12]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[13]  Kagan Tumer,et al.  Efficient Evaluation Functions for Evolving Coordination , 2008, Evolutionary Computation.

[14]  Hong Chen,et al.  Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems , 1995, IEEE Trans. Neural Networks.

[15]  Lynne E. Parker,et al.  Lifelong Adaptation in Heterogeneous Multi-Robot Teams: Response to Continual Variation in Individual Robot Performance , 2000, Auton. Robots.

[16]  Barbara Messing,et al.  An Introduction to MultiAgent Systems , 2002, Künstliche Intell..

[17]  Maja J. Mataric,et al.  Multi-robot task allocation: analyzing the complexity and optimality of key architectures , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[18]  Jen Jen Chung,et al.  Implicit adaptive multi-robot coordination in dynamic environments , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Dario Floreano,et al.  Neuroevolution: from architectures to learning , 2008, Evol. Intell..

[20]  Sam Devlin,et al.  Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.

[21]  Kagan Tumer,et al.  Efficient Evaluation Functions for Multi-rover Systems , 2004, GECCO.

[22]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[23]  Kagan Tumer,et al.  Shaping fitness functions for coevolving cooperative multiagent systems , 2012, AAMAS.

[24]  Kagan Tumer,et al.  Coordinating multi-rover systems: evaluation functions for dynamic and noisy environments , 2005, GECCO '05.

[25]  Kagan Tumer,et al.  Optimizing ballast design of wave energy converters using evolutionary algorithms , 2011, GECCO '11.

[26]  Maja J. Mataric,et al.  Maximizing Reward in a Non-Stationary Mobile Robot Environment , 2003, Autonomous Agents and Multi-Agent Systems.

[27]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[28]  Manuela M. Veloso,et al.  Simultaneous Adversarial Multi-Robot Learning , 2003, IJCAI.

[29]  David B. Fogel,et al.  An introduction to simulated evolutionary optimization , 1994, IEEE Trans. Neural Networks.

[30]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[31]  Kagan Tumer,et al.  Analyzing and visualizing multiagent rewards in dynamic and stochastic domains , 2008, Autonomous Agents and Multi-Agent Systems.

[32]  A. Lapedes,et al.  Nonlinear signal processing using neural networks: Prediction and system modelling , 1987 .

[33]  Lynne E. Parker,et al.  ALLIANCE: an architecture for fault tolerant multirobot cooperation , 1998, IEEE Trans. Robotics Autom..

[34]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[35]  Victor R. Lesser,et al.  A survey of multi-agent organizational paradigms , 2004, The Knowledge Engineering Review.

[36]  Kagan Tumer,et al.  Coevolution of heterogeneous multi-robot teams , 2010, GECCO '10.

[37]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[38]  Edwin D. de Jong,et al.  Evolutionary Multi-agent Systems , 2004, PPSN.

[39]  Jordan B. Pollack,et al.  A game-theoretic and dynamical-systems analysis of selection methods in coevolution , 2005, IEEE Transactions on Evolutionary Computation.

[40]  Kagan Tumer,et al.  Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[41]  Sam Devlin,et al.  Dynamic potential-based reward shaping , 2012, AAMAS.

[42]  Maja J. Mataric,et al.  Pusher-watcher: an approach to fault-tolerant tightly-coupled robot coordination , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).