Simultaneous Adversarial Multi-Robot Learning

Multi-robot learning faces all of the challenges of robot learning with all of the challenges of multiagent learning. There has been a great deal of recent research on multiagent reinforcement learning in stochastic games, which is the intuitive extension of MDPs to multiple agents. This recent work, although general, has only been applied to small games with at most hundreds of states. On the other hand robot tasks have continuous, and often complex, state and action spaces. Robot learning tasks demand approximation and generalization techniques, which have only received extensive attention in single-agent learning. In this paper we introduce GraWoLF, a general-purpose, scalable, multiagent learning algorithm. It combines gradient-based policy learning techniques with the WoLF ("Win or Learn Fast") variable learning rate. We apply this algorithm to an adversarial multi-robot task with simultaneous learning. We show results of learning both in simulation and on the real robots. These results demonstrate that GraWoLF can learn successful policies, overcoming the many challenges in multi-robot learning.

[1]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[3]  Editors , 1986, Brain Research Bulletin.

[4]  Thomas G. Dietterich,et al.  In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.

[5]  Jean-Charles Régin,et al.  A Filtering Algorithm for Constraints of Difference in CSPs , 1994, AAAI.

[6]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[7]  Nicolas Beldiceanu,et al.  Introducing global constraints in CHIP , 1994 .

[8]  Rina Dechter,et al.  Constraint Processing , 1995, Lecture Notes in Computer Science.

[9]  Jean-Charles Régin,et al.  Generalized Arc Consistency for Global Cardinality Constraint , 1996, AAAI/IAAI, Vol. 1.

[10]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[11]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[12]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[13]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[14]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[15]  Gerhard Weiss,et al.  Multiagent Systems , 1999 .

[16]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[17]  Peter L. Bartlett,et al.  Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.

[18]  Mats Carlsson,et al.  Revisiting the Cardinality Operator and Introducing the Cardinality-Path Constraint Family , 2001, ICLP.

[19]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[20]  Toby Walsh,et al.  Global Constraints for Lexicographic Orderings , 2002, CP.

[21]  Michael J. Maher A Synthesis of Constraint Satisfaction and Constraint Solving , 2003, CP.

[22]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[23]  Roman Barták,et al.  Dynamic Global Constraints in Backtracking Based Environments , 2003, Ann. Oper. Res..

[24]  Manuela M. Veloso,et al.  Existence of Multiagent Equilibria with Limited Agents , 2004, J. Artif. Intell. Res..

[25]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[26]  Jimmy Ho-Man Lee,et al.  Global Constraints for Integer and Set Value Precedence , 2004, CP.

[27]  Gilles Pesant,et al.  A Regular Language Membership Constraint for Finite Sequences of Variables , 2004, CP.

[28]  Nicolas Beldiceanu,et al.  Global Constraint Catalog , 2005 .

[29]  Boi Faltings,et al.  Open constraint programming , 2005, Artif. Intell..

[30]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.