A Survey on Multiagent Reinforcement Learning Towards Multi-Robot Systems

Multiagent reinforcement learning for multirobot systems is a challenging issue in both robotics and artificial intelligence. With the ever increasing interests in theoretical research and practical applications, currently there have been a lot of efforts towards providing some solutions to this challenge. However, there are still many difficulties in scaling up multiagent reinforcement learning to multi-robot systems. The main objective of this paper is to provide a survey on multiagent reinforcement learning in multi-robot systems, based on the literature the authors collected. After reviewing some important advances in this field, some challenging problems are analyzed. A concluding remark is made from the perspectives of the authors.

[1]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[2]  S. van der Zwaan,et al.  Cooperative learning and planning for multiple robots , 2000, Proceedings of the 2000 IEEE International Symposium on Intelligent Control. Held jointly with the 8th IEEE Mediterranean Conference on Control and Automation (Cat. No.00CH37147).

[3]  Akira Hayashi,et al.  A multiagent reinforcement learning algorithm using extended optimal response , 2002, AAMAS '02.

[4]  Jiming Liu,et al.  Multi-agent robotic systems , 2001 .

[5]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[6]  Jürgen Schmidhuber,et al.  Reinforcement Learning Soccer Teams with Incomplete World Models , 1999, Auton. Robots.

[7]  Tucker R. Balch,et al.  Behavior-based formation control for multirobot teams , 1998, IEEE Trans. Robotics Autom..

[8]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[9]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[10]  D. Vengerov,et al.  An Empirical Model of Factor Adjustment Dynamics , 2006 .

[11]  Huosheng Hu,et al.  KaBaGe-RL: Kanerva-based generalisation and reinforcement learning for possession football , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[12]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[13]  Ahmet Arslan,et al.  Minimax Fuzzy Q-Learning in Cooperative Multi-agent Systems , 2002, ADVIS.

[14]  Majid Nili Ahmadabadi,et al.  Expertness based cooperative Q-learning , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[15]  Michail G. Lagoudakis,et al.  Value Function Approximation in Zero-Sum Markov Games , 2002, UAI.

[16]  Michael P. Wellman,et al.  Learning about other agents in a dynamic multiagent system , 2001, Cognitive Systems Research.

[17]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[18]  Vladislav Tadic,et al.  On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001, Machine Learning.

[19]  Reda Alhajj,et al.  Multiagent reinforcement learning using function approximation , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[20]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[21]  Csaba Szepesv Ari,et al.  Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms , 1996 .

[22]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[23]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[24]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[25]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[26]  Minoru Asada,et al.  Cooperative Behavior Acquisition for Mobile Robots in Dynamically Changing Real Worlds Via Vision-Based Reinforcement Learning and Development , 1999, Artif. Intell..

[27]  Ron Sun,et al.  Rationality Assumptions and Optimality of Co-learning , 2000, PRIMA.

[28]  H.R. Berenji,et al.  Cooperation and coordination between fuzzy reinforcement learning agents in continuous state partially observable Markov decision processes , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[29]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[30]  Bikramjit Banerjee,et al.  Convergent Gradient Ascent in General-Sum Games , 2002, ECML.

[31]  Eduardo F. Morales,et al.  Scaling Up Reinforcement Learning with a Relational Representation , 2003 .

[32]  Sandip Sen,et al.  Multiagent Coordination with Learning Classifier Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[33]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[34]  Jong-Hwan Kim,et al.  Modular Q-learning based multi-agent cooperation for robot soccer , 2001, Robotics Auton. Syst..

[35]  Andrew B. Kahng,et al.  Cooperative Mobile Robotics: Antecedents and Directions , 1997, Auton. Robots.

[36]  Rémi Munos,et al.  A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions , 2000, Machine Learning.

[37]  Fredrik A. Dahl,et al.  The Lagging Anchor Algorithm: Reinforcement Learning in Two-Player Zero-Sum Games with Imperfect Information , 2002, Machine Learning.

[38]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[39]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning in Stochastic Games , 1999, ICML 1999.

[40]  Imad H. Elhajj,et al.  Design and Analysis of Internet-Based Tele-Coordinated Multi-Robot Systems , 2003, Auton. Robots.

[41]  Manuela Veloso,et al.  Multiagent learning in the presence of agents with limitations , 2003 .

[42]  Hamid R. Berenji,et al.  A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters , 2003, IEEE Trans. Fuzzy Syst..

[43]  Daniele Nardi,et al.  Distributed Coordination in Heterogeneous Multi-Robot Systems , 2003, Auton. Robots.

[44]  François Michaud,et al.  Learning from History for Behavior-Based Mobile Robots in Non-Stationary Conditions , 1998, Machine Learning.

[45]  Claude F. Touzet,et al.  Distributed Lazy Q-Learning for Cooperative Mobile Robots , 2004 .

[46]  Georgios Chalkiadakis Multiagent reinforcement learning: stochastic games with multiple learning players , 2003 .

[47]  L. E. ParkerCenter Learning in Large Cooperative Multi-Robot Domains , 2001 .

[48]  Gaurav S. Sukhatme,et al.  Multi-Robot Task Allocation in Uncertain Environments , 2003, Auton. Robots.

[49]  Claude F. Touzet,et al.  Robot Awareness in Cooperative Mobile Robot Learning , 2000, Auton. Robots.

[50]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[51]  Peter Stone,et al.  Leading Best-Response Strategies in Repeated Games , 2001, International Joint Conference on Artificial Intelligence.

[52]  Bikramjit Banerjee,et al.  Adaptive policy gradient in multiagent learning , 2003, AAMAS '03.

[53]  Csaba Szepesvári,et al.  A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[54]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[55]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[56]  Maja J. Mataric,et al.  Learning in behavior-based multi-robot systems: policies, models, and other agents , 2001, Cognitive Systems Research.

[57]  Manuela M. Veloso,et al.  Simultaneous Adversarial Multi-Robot Learning , 2003, IJCAI.