Apprentissage par Renforcement et Théorie des Jeux pour la coordination de Systèmes Multi-Agents

This article presents the main reinforcement learning algorithms that aim at coordinating multi-agent systems by using tools and formalisms borrowed from Game Theory. Limits of these approaches are studied and discussed in order to draw some promising lines of research for that particular field. We argue more deeply around the central notions of Nash equilibrium and games with imperfect monitoring.

[1]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[2]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[3]  François Charpillet,et al.  Cooperation in stochastic games through communication , 2005, AAMAS '05.

[4]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[5]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Roger B. Myerson,et al.  Game theory - Analysis of Conflict , 1991 .

[8]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[9]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[10]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[11]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[12]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[13]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[14]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[15]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[16]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.