Multi-agent Reinforcement Learning: An Overview

Multi-agent systems can be used to address problems in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must instead discover a solution on their own, using learning. A significant part of the research on multi-agent learning concerns reinforcement learning techniques. This chapter reviews a representative selection of multi-agent reinforcement learning algorithms for fully cooperative, fully competitive, and more general (neither cooperative nor competitive) tasks. The benefits and challenges of multi-agent reinforcement learning are described. A central challenge in the field is the formal statement of a multi-agent learning goal; this chapter reviews the learning goals proposed in the literature. The problem domains where multi-agent reinforcement learning techniques have been applied are briefly discussed. Several multi-agent reinforcement learning algorithms are applied to an illustrative example involving the coordinated transportation of an object by two cooperative robots. In an outlook for the multi-agent reinforcement learning field, a set of important open issues are identified, and promising research directions to address these issues are outlined.

[1]  Olivier Buffet,et al.  Shaping multi-agent systems with gradient reinforcement learning , 2007, Autonomous Agents and Multi-Agent Systems.

[2]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[3]  William T. B. Uther,et al.  Adversarial Reinforcement Learning , 2003 .

[4]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[5]  R. Paul Wiegand,et al.  Biasing Coevolutionary Search for Optimal Multiagent Behaviors , 2006, IEEE Transactions on Evolutionary Computation.

[6]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[7]  Klaus Debes,et al.  A reinforcement learning based neural multiagent system for control of a combustion process , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[8]  Hamid R. Berenji,et al.  A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters , 2003, IEEE Trans. Fuzzy Syst..

[9]  Manuela Veloso,et al.  An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[10]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control 3rd Edition, Volume II , 2010 .

[11]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[12]  Claude F. Touzet,et al.  Neural reinforcement learning for behaviour synthesis , 1997, Robotics Auton. Syst..

[13]  Martin A. Riedmiller,et al.  Karlsruhe Brainstormers - A Reinforcement Learning Approach to Robotic Soccer , 2000, RoboCup.

[14]  Ralph Arnote,et al.  Hong Kong (China) , 1996, OECD/G20 Base Erosion and Profit Shifting Project.

[15]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[16]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[17]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[18]  Ann Nowé,et al.  Evolutionary game theory and multi-agent reinforcement learning , 2005, The Knowledge Engineering Review.

[19]  Lionel Jouffe,et al.  Fuzzy inference system learning by reinforcement methods , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[20]  Makoto Yokoo,et al.  Intelligent Agents: Specification, Modeling, and Applications , 2001, Lecture Notes in Computer Science.

[21]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[22]  Bart De Schutter,et al.  Multiagent Reinforcement Learning with Adaptive State Focus , 2005, BNAIC.

[23]  Sandip Sen,et al.  Strongly Typed Genetic Programming in Evolving Cooperation Strategies , 1995, ICGA.

[24]  Ville Könönen,et al.  Asymmetric multiagent reinforcement learning , 2003, Web Intell. Agent Syst..

[25]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[26]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[27]  Sridhar Mahadevan,et al.  Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[28]  Vivek S. Borkar,et al.  An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..

[29]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[30]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[31]  Dit-Yan Yeung,et al.  Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control , 1995, NIPS.

[32]  Moshe Tennenholtz,et al.  Adaptive Load Balancing: A Study in Multi-Agent Learning , 1994, J. Artif. Intell. Res..

[33]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[34]  Shin Ishii,et al.  A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game , 2005, Machine Learning.

[35]  Milind Tambe,et al.  Intelligent Agents VIII , 2002, Lecture Notes in Computer Science.

[36]  J M Smith,et al.  Evolution and the theory of games , 1976 .

[37]  Yukinori Kakazu,et al.  An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning , 2003, Robotics Auton. Syst..

[38]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[39]  Bart De Schutter,et al.  Approximate Dynamic Programming and Reinforcement Learning , 2010, Interactive Collaborative Information Systems.

[40]  Bikramjit Banerjee,et al.  Adaptive policy gradient in multiagent learning , 2003, AAMAS '03.

[41]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[42]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[43]  Liming Xiang,et al.  Kernel-Based Reinforcement Learning , 2006, ICIC.

[44]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[45]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[46]  Marco Wiering,et al.  Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .

[47]  Warren B. Powell,et al.  Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .

[48]  Craig Boutilier,et al.  Implicit Imitation in Multiagent Reinforcement Learning , 1999, ICML.

[49]  Jordan B. Pollack,et al.  A Game-Theoretic Approach to the Simple Coevolutionary Algorithm , 2000, PPSN.

[50]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[51]  R. Paul Wiegand,et al.  Improving Coevolutionary Search for Optimal Multiagent Behaviors , 2003, IJCAI.

[52]  Y. Narahari,et al.  Reinforcement learning applications in dynamic pricing of retail markets , 2003, EEE International Conference on E-Commerce, 2003. CEC 2003..

[53]  Georgios Chalkiadakis Multiagent reinforcement learning: stochastic games with multiple learning players , 2003 .

[54]  David Carmel,et al.  Opponent Modeling in Multi-Agent Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[55]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[56]  Daniel Kudenko,et al.  Adaptive Agents and Multi-Agent Systems , 2003, Lecture Notes in Computer Science.

[57]  Reda Alhajj,et al.  Multiagent reinforcement learning using function approximation , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[58]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[59]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[60]  Pierre Yves Glorennec,et al.  Reinforcement Learning: an Overview , 2000 .

[61]  L. Buşoniu,et al.  A comprehensive survey of multi-agent reinforcement learning , 2011 .

[62]  Corso Elvezia A General Method for Incremental Self-improvement and Multi-agent Learning in Unrestricted Environments , 1996 .

[63]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[64]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[65]  Q. Henry Wu,et al.  Multi-agent learning for routing control within an Internet environment , 2004, Eng. Appl. Artif. Intell..

[66]  Yoav Shoham,et al.  New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.

[67]  Kenneth A. De Jong,et al.  A Cooperative Coevolutionary Approach to Function Optimization , 1994, PPSN.

[68]  Bernard Manderick,et al.  Q-Learning in Simulated Robotic Soccer - Large State Spaces and Incomplete Information , 2002, ICMLA.

[69]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[70]  Mohamed S. Kamel,et al.  Learning Coordination Strategies for Cooperative Multiagent Systems , 1998, Machine Learning.

[71]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[72]  採編典藏組 Society for Industrial and Applied Mathematics(SIAM) , 2008 .

[73]  Manuela Veloso,et al.  Multiagent learning in the presence of agents with limitations , 2003 .

[74]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[75]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[76]  Guillermo Ricardo Simari,et al.  Multiagent systems: a modern approach to distributed artificial intelligence , 2000 .

[77]  Reinhard Männer,et al.  Parallel Problem Solving from Nature — PPSN III , 1994, Lecture Notes in Computer Science.

[78]  Felix A. Fischer,et al.  Hierarchical reinforcement learning in communication-mediated multiagent coordination , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[79]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[80]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[81]  Bart De Schutter,et al.  Multi-agent model predictive control for transportation networks: Serial versus parallel schemes , 2008, Eng. Appl. Artif. Intell..

[82]  Claude F. Touzet,et al.  Robot Awareness in Cooperative Mobile Robot Learning , 2000, Auton. Robots.

[83]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[84]  Jürgen Schmidhuber,et al.  Reinforcement Learning Soccer Teams with Incomplete World Models , 1999, Auton. Robots.

[85]  Nikos A. Vlassis,et al.  Sparse cooperative Q-learning , 2004, ICML.

[86]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[87]  T. Jung,et al.  Kernelizing LSPE(λ) , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[88]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[89]  Sandip Sen,et al.  Learning to Coordinate without Sharing Information , 1994, AAAI.

[90]  Nikos A. Vlassis,et al.  Utile Coordination: Learning Interdependencies Among Cooperative Agents , 2005, CIG.

[91]  Ville Könönen,et al.  Gradient Based Method for Symmetric and Asymmetric Multiagent Reinforcement Learning , 2003, IDEAL.

[92]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[93]  E.H.J. Nijhuis,et al.  Cooperative multi-agent reinforcement learning of traffic lights , 2005 .

[94]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[95]  T. Horiuchi,et al.  Fuzzy interpolation-based Q-learning with continuous states and actions , 1996, Proceedings of IEEE 5th International Fuzzy Systems.

[96]  Manuela M. Veloso,et al.  Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[97]  Nikos Vlassis,et al.  A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence I Mobk077-fm Synthesis Lectures on Artificial Intelligence and Machine Learning a Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence a Concise Introduction to Multiagent Systems and D , 2007 .

[98]  Sandip Sen,et al.  Learning in multiagent systems , 1999 .

[99]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[100]  William D. Smart,et al.  Interpolation-based Q-learning , 2004, ICML.

[101]  Xin Yao,et al.  Parallel Problem Solving from Nature PPSN VI , 2000, Lecture Notes in Computer Science.

[102]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[103]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[104]  Michael P. Wellman,et al.  The 2001 trading agent competition , 2002, Electron. Mark..

[105]  Geoffrey E. Hinton,et al.  Unsupervised learning : foundations of neural computation , 1999 .

[106]  Michael H. Bowling,et al.  Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[107]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[108]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[109]  Csaba Szepesvári,et al.  Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..

[110]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[111]  Julie A. Adams,et al.  Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence , 2001, AI Mag..

[112]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[113]  Colin R. Reeves,et al.  Evolutionary computation: a unified approach , 2007, Genetic Programming and Evolvable Machines.

[114]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[115]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[116]  Victor R. Lesser,et al.  Learning organizational roles for negotiated search in a multiagent system , 1998, Int. J. Hum. Comput. Stud..

[117]  Maja J. Mataric,et al.  Learning in Multi-Robot Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[118]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[119]  Jeffrey O. Kephart,et al.  Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.

[120]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[121]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[122]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[123]  Andrew W. Moore,et al.  Reinforcement Learning for Cooperating and Communicating Reactive Agents in Electrical Power Grids , 2000, Balancing Reactivity and Social Deliberation in Multi-Agent Systems.

[124]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[125]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[126]  Gunes Ercal,et al.  On No-Regret Learning, Fictitious Play, and Nash Equilibrium , 2001, ICML.

[127]  Jae Won Lee,et al.  A Multi-agent Q-learning Framework for Optimizing Stock Trading Systems , 2002, DEXA.

[128]  Bart De Schutter,et al.  Decentralized Reinforcement Learning Control of a Robotic Manipulator , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.

[129]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[130]  Akira Hayashi,et al.  A multiagent reinforcement learning algorithm using extended optimal response , 2002, AAMAS '02.

[131]  H. Van Dyke Parunak,et al.  Industrial and practical applications of DAI , 1999 .

[132]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[133]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[134]  Andriy Zapechelnyuk Limit Behavior of No-regret Dynamics , 2009 .

[135]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[136]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[137]  Enrico Pagello,et al.  Balancing Reactivity and Social Deliberation in Multi-Agent Systems , 2001, Lecture Notes in Computer Science.

[138]  DeLiang Wang,et al.  Unsupervised Learning: Foundations of Neural Computation , 2001, AI Mag..

[139]  Thomas Bäck,et al.  Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[140]  T. Başar,et al.  Dynamic Noncooperative Game Theory, 2nd Edition , 1998 .

[141]  Shichao Zhang,et al.  AI 2005: Advances in Artificial Intelligence, 18th Australian Joint Conference on Artificial Intelligence, Sydney, Australia, December 5-9, 2005, Proceedings , 2005, Australian Conference on Artificial Intelligence.

[142]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[143]  Matthijs T. J. Spaan,et al.  High level coordination of agents based on multiagent Markov decision processes with roles , 2002 .

[144]  Von-Wun Soo,et al.  Market Performance of Adaptive Trading Agents in Synchronous Double Auctions , 2001, PRIMA.

[145]  Dimitri P. Bertsekas,et al.  Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..

[146]  Sean P. Meyn,et al.  An analysis of reinforcement learning with function approximation , 2008, ICML '08.

[147]  José M. Vidal,et al.  Learning in Multiagent Systems: An Introduction from a Game-Theoretic Perspective , 2003, Adaptive Agents and Multi-Agents Systems.

[148]  Peter Stone,et al.  Implicit Negotiation in Repeated Games , 2001, ATAL.

[149]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[150]  Toshiyuki Sueyoshi,et al.  An agent-based decision support system for wholesale electricity market , 2008, Decis. Support Syst..

[151]  Jeffrey S. Rosenschein,et al.  Best-response multiagent learning in non-stationary environments , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[152]  Nikos A. Vlassis,et al.  Non-communicative multi-robot coordination in dynamic environments , 2005, Robotics Auton. Syst..

[153]  Robert Fitch,et al.  Structural Abstraction Experiments in Reinforcement Learning , 2005, Australian Conference on Artificial Intelligence.

[154]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[155]  Jürgen Schmidhuber,et al.  Learning Team Strategies: Soccer Case Studies , 1998, Machine Learning.

[156]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[157]  Andrew G. Barto,et al.  Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[158]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[159]  Frans C. A. Groen,et al.  Interactive Collaborative Information Systems , 2012, Interactive Collaborative Information Systems.

[160]  Shin Ishii,et al.  Multiagent reinforcement learning applied to a chase problem in a continuous world , 2001, Artificial Life and Robotics.