Game Theory and Multi-agent Reinforcement Learning

Reinforcement Learning was originally developed for Markov Decision Processes (MDPs). It allows a single agent to learn a policy that maximizes a possibly delayed reward signal in a stochastic stationary environment. It guarantees convergence to the optimal policy, provided that the agent can sufficiently experiment and the environment in which it is operating is Markovian. However, when multiple agents apply reinforcement learning in a shared environment, this might be beyond the MDP model. In such systems, the optimal policy of an agent depends not only on the environment, but on the policies of the other agents as well. These situations arise naturally in a variety of domains, such as: robotics, telecommunications, economics, distributed control, auctions, traffic light control, etc. In these domains multi-agent learning is used, either because of the complexity of the domain or because control is inherently decentralized. In such systems it is important that agents are capable of discovering good solutions to the problem at hand either by coordinating with other learners or by competing with them. This chapter focuses on the application reinforcement learning techniques in multi-agent systems. We describe a basic learning framework based on the economic research into game theory, and illustrate the additional complexity that arises in such systems. We also described a representative selection of algorithms for the different areas of multi-agent reinforcement learning research.

[1]  Daniel Kudenko,et al.  Learning to Coordinate Using Commitment Sequences in Cooperative Multi-agent Systems , 2005, Adaptive Agents and Multi-Agent Systems.

[2]  Ann Nowé,et al.  Evolutionary game theory and multi-agent reinforcement learning , 2005, The Knowledge Engineering Review.

[3]  Peter Vrancx,et al.  Learning multi-agent state space representations , 2010, AAMAS.

[4]  Peter Vrancx,et al.  Detecting and Solving Future Multi-Agent Interactions , 2011 .

[5]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[6]  D. E. Matthews Evolution and the Theory of Games , 1977 .

[7]  Craig Boutilier,et al.  Sequential decision making in repeated coalition formation under uncertainty , 2008, AAMAS.

[8]  Jelle R. Kok,et al.  Sparse Tabular Multiagent Q-learning ⁄ , 2004 .

[9]  Ann Nowé,et al.  Coordinated exploration in multi-agent reinforcement learning: an application to load-balancing , 2005, AAMAS '05.

[10]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[11]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Michael L. Littman,et al.  A Distributed Reinforcement Learning Scheme for Network Routing , 1993 .

[13]  Richard M. Everson,et al.  Intelligent Data Engineering and Automated Learning – IDEAL 2004 , 2004, Lecture Notes in Computer Science.

[14]  Shobha Venkataraman,et al.  Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[15]  Manuela Veloso,et al.  Scalable Learning in Stochastic Games , 2002 .

[16]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[17]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[18]  Les Firbank,et al.  Intermediate Statistics: A Modern Approach , 1992 .

[19]  Michael L. Littman,et al.  Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.

[20]  Francisco S. Melo,et al.  Interaction-driven Markov games for decentralized multiagent planning under uncertainty , 2008, AAMAS.

[21]  Dean Gillette,et al.  9. STOCHASTIC GAMES WITH ZERO STOP PROBABILITIES , 1958 .

[22]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[23]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[24]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[25]  M. J. Sobel Noncooperative Stochastic Games , 1971 .

[26]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[27]  Herbert Gintis,et al.  Game Theory Evolving: A Problem-Centered Introduction to Modeling Strategic Interaction - Second Edition , 2009 .

[28]  Michael L. Littman,et al.  Cyclic Equilibria in Markov Games , 2005, NIPS.

[29]  V. Kononen,et al.  Asymmetric multiagent reinforcement learning , 2003, IEEE/WIC International Conference on Intelligent Agent Technology, 2003. IAT 2003..

[30]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[31]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[32]  Peter Vrancx,et al.  Switching dynamics of multi-agent learning , 2008, AAMAS.

[33]  Y. Shoham Introduction to Multi-Agent Systems , 2002 .

[34]  S. Hart,et al.  A Reinforcement Procedure Leading to Correlated Equilibrium , 2001 .

[35]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[36]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[37]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[38]  Ville Könönen,et al.  Policy Gradient Method for Team Markov Games , 2004, IDEAL.

[39]  Manuela M. Veloso,et al.  Convergence of Gradient Dynamics with a Variable Learning Rate , 2001, ICML.

[40]  K. Narendra,et al.  Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.

[41]  Dean P. Foster,et al.  Regret Testing: A Simple Payo-Based Procedure for Learning Nash Equilibrium , 2005 .

[42]  Manuela M. Veloso,et al.  Learning of coordination: exploiting sparse interactions in multiagent systems , 2009, AAMAS.

[43]  Peter Vrancx,et al.  Decentralized Learning in Markov Games , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[44]  Nikos A. Vlassis,et al.  Sparse cooperative Q-learning , 2004, ICML.

[45]  Peter Vrancx,et al.  Transfer Learning for Multi-agent Coordination , 2011, ICAART.

[46]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[47]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[48]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[49]  Yoav Shoham,et al.  Essentials of Game Theory: A Concise Multidisciplinary Introduction , 2008, Essentials of Game Theory: A Concise Multidisciplinary Introduction.

[50]  Gerhard Weiss,et al.  Multiagent systems: a modern approach to distributed artificial intelligence , 1999 .

[51]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[52]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[53]  Luc De Raedt,et al.  Machine Learning: ECML 2001 , 2001, Lecture Notes in Computer Science.

[54]  Dipti Srinivasan,et al.  An Introduction to Multi-Agent Systems , 2010 .

[55]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[56]  Nikos A. Vlassis,et al.  Utile Coordination: Learning Interdependencies Among Cooperative Agents , 2005, CIG.

[57]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[58]  Eduardo F. Morales,et al.  DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning , 2001, ECML.

[59]  Thomas Stützle,et al.  Ant Colony Optimization Theory , 2004 .

[60]  V. V. Phansalkar,et al.  Decentralized Learning of Nash Equilibria in Multi-Person Stochastic Games With Incomplete Information , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[61]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).